# [Filtering](https://www.youtube.com/watch?v=Lw2rlcxScZY&list=PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU&index=131&ab_channel=CoreySchafer) and [updating](https://www.youtube.com/watch?v=DCDe29sIKcE&list=PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU&index=132&ab_channel=CoreySchafer) of ``DataFrame`` : examples.

[stackoverflow__survey_2020](https://drive.google.com/file/d/1dfGerWeWkcyQ9GX9x20rdSGj7WtEpzBB/view)

In [None]:
import pandas as pd

## 1. Discovering of data structure
----------------------------------

In [None]:
df= pd.read_csv('data/survey_results_public.csv')
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
pd.set_option('display.max_columns', 61)

In [None]:
df.head(5)

In [None]:
schema_df= pd.read_csv('data/survey_results_schema.csv')

In [None]:
schema_df

In [None]:
schema_df.loc[0:2]

In [None]:
schema_df['QuestionText'][1]

In [None]:
schema_df.iloc[0:2, 1]

In [None]:
schema_df.loc[[0,1], 'QuestionText']

In [None]:
df['Hobbyist']

#### 1.1. Counting
-----------------

In [None]:
df['Hobbyist'].value_counts()

In [None]:
df.loc[0, 'Hobbyist']

In [None]:
df.loc[0:1, 'Hobbyist']

In [None]:
df.loc[0:1, 'Hobbyist':'Age1stCode']

#### 1.2. Reindexing 
__________________

In [None]:
df.set_index('Respondent', inplace=True)

In [None]:
df

In [None]:
df.loc[[0,1,2], 'Hobbyist':'Age1stCode'] # mistake: wrong index

In [None]:
df.loc[[1,2], 'Hobbyist':'Age1stCode']

In [None]:
df.reset_index(inplace=True)

In [None]:
df.head(5)

## 2. Exploration  `DataFrame` using conditional filters
----------------------------------------------------------

#### 2.1. Filtering by masks
------------------

In [None]:
filt=df['Age']>20
filt.head(10)

In [None]:
type(filt)

* Series as mask:
  * `True` -- rows that met filter criteria
  * `False`-- rows didn't meet filter criteria
* The result of applying mask to DataFrame are rows of **temporal** DataFrame that meet filter

In [None]:
df[filt].head(5)

In [None]:
type(df[filt])

In [None]:
df.loc[filt].head(5)

In [None]:
df.loc[filt, 'Hobbyist'].head(5)

#### 2.2. Compound filters
____________________________

In [None]:
filt = (df['Age']>20) & (df['Hobbyist'] == 'Yes' ) & (df.CompTotal>100000)
filt

In [None]:
filt.value_counts()

In [None]:
df.loc[filt, ['CompTotal', 'Country', 'LanguageWorkedWith']].head(5)

#### 2.3. Producing `DataFrame` with `list` of the columns in filter
_______________________________________________________________

In [None]:
countries=['United States', 'Canada', 'India', 'Germany']
filt = filt & (df['Country'].isin(countries))

In [None]:
df.loc[filt, ['CompTotal', 'Country', 'LanguageWorkedWith']].head(5)

In [None]:
filt = (df['Country']== 'Ukraine')

In [None]:
df.loc[filt, ['CompTotal', 'Country', 'LanguageWorkedWith']].head(5)

In [None]:
filt = filt &(df['Hobbyist'] == 'Yes' ) & (df.CompTotal>100000)

In [None]:
df.loc[filt, ['CompTotal', 'Country', 'LanguageWorkedWith']].head(5)

## 3. Processing of the column's content using `pd.Series.str` 
___________________________________________________________________

#### 3.1. Overview of ``pd.Series.str``
________________________________________

In [None]:
type(pd.Series.str)

In [None]:
pd.Series.str?

In [None]:
dir(pd.Series.str)

In [None]:
df['LanguageWorkedWith']

In [None]:
filt = (df.LanguageWorkedWith.str.contains('Python', na=False))& (df.CompTotal>100000)& (df['Country'].isin(countries))&(df['LanguageWorkedWith'].str.contains('C\+\+', na=False)) 

In [None]:
df.loc[filt, ['CompTotal', 'Country', 'LanguageWorkedWith']]

#### 3.2. Updating columns
_____________________

In [None]:
df.columns=[x.upper() for x in df.columns]

In [None]:
df.head(5)

In [None]:
df.columns=[x.lower() for x in df.columns]

In [None]:
df.head(5)

In [None]:
df= pd.read_csv('data/survey_results_public.csv')
df.head()

In [None]:
df.rename(columns={'CompTotal': 'Salary'}, inplace=True)
df.head(5)

#### 3.3. Updating the value of a cell 
_______________________________

In [None]:
df.loc[2, ['Age', 'Age1stCode']]=[19, 9]

In [None]:
df.head(5)

In [None]:
df.at[2, ['Age', 'Age1stCode']]=[20, 8]

In [None]:
df.head(5)

## 4. Processing of the column's content using `apply()`
--------------------

#### 4.1. Invoke a function on all elementss of Series by ``pd.Series.apply``
------------------

In [None]:
pd.Series.apply?

In [None]:
#df['Hobbyist'].apply(len)
df['Hobbyist'].dropna().apply(len)

In [None]:
def update_n(s, v):
    return s+v

In [None]:
df['Salary']

In [None]:
df['Salary'].apply(update_n, args=(321,))

In [None]:
df.head(10)

In [None]:
df['Salary'] = df['Salary'].apply(update_n, args=(321,))

In [None]:
df.head(10)

In [None]:
df['Salary'].apply(lambda x: x+123)

In [None]:
len(df['Salary'])

In [None]:
df['Salary'].min()

In [None]:
df['Hobbyist'].map({'Yes': True, 'No':False})

In [None]:
df.head()

#### 4.2. Calling a function to `DataFrame` using ``pd.DataFrame.apply()``
--------------------------
* Apply a function along an axis of the `DataFrame` (on each row or colomn series) 
* Objects passed to the function are Series objects:
    * ``axis=0`` or ``'index'``  -- apply function to each column 
    * ``axis=1`` or ``'columns'`` -- apply function to each row

In [None]:
pd.DataFrame.apply?

In [None]:
df.apply(len)

In [None]:
df.apply(len, axis=1)

In [None]:
dff = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
dff

In [None]:
import numpy as np

In [None]:
dff.apply(np.sqrt)

In [None]:
dff.apply(np.sum)#, axis=0)

In [None]:
dff.apply(np.sum, axis=1)