[9 Awesome Python Pandas Usages Every Data Scientists Should Know](https://betterprogramming.pub/9-awesome-python-pandas-usages-every-data-scientists-should-know-62911eed81e9)

`lr_simple_life_satisfaction_1.ipynb`

# Create a dataframe from a dictionary and display it

In [27]:
import pandas as pd

employees = {
    'Name of Employee': ['Jon','Mark','Tina','Maria','Bill','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill','Jon','Mark','Tina','Maria','Bill'],
    'Sales': [1000,300,400,500,1000,500,700,50,60,1000,900,750,200,300,1000,900,250,750,50],
    'Quarter': [1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
    'Country': ['US','Japan','Brazil','UK','Brazil','Japan','Brazil','US','US','US','Japan','Brazil','UK','Brazil','Japan','Japan','Brazil','UK','US']
    }

df = pd.DataFrame(employees, columns= ['Name of Employee','Sales','Quarter','Country'])

In [28]:
print (df.head(3))

  Name of Employee  Sales  Quarter Country
0              Jon   1000        1      US
1             Mark    300        1   Japan
2             Tina    400        1  Brazil


In [30]:
df # w/o print(.) function

Unnamed: 0,Name of Employee,Sales,Quarter,Country
0,Jon,1000,1,US
1,Mark,300,1,Japan
2,Tina,400,1,Brazil
3,Maria,500,1,UK
4,Bill,1000,1,Brazil
5,Mark,500,2,Japan
6,Tina,700,2,Brazil
7,Maria,50,2,US
8,Bill,60,2,US
9,Jon,1000,3,US


# Remove columns and rows

Let's remove one column and store the result in a new dataframe:

In [8]:
df2 = df.drop('Quarter', axis = 1) # Remove a column with name "Quarter"
df2.head(3)

Unnamed: 0,Name of Employee,Sales,Country
0,Jon,1000,US
1,Mark,300,Japan
2,Tina,400,Brazil


Let's remove two rows with specified **indices**:

In [9]:
df3 = df.drop(labels = [0,1], axis = 0) # Remove rows with labels 0 and 1 - Note: rows' labels are the indices.
df3.head(3)

Unnamed: 0,Name of Employee,Sales,Quarter,Country
2,Tina,400,1,Brazil
3,Maria,500,1,UK
4,Bill,1000,1,Brazil


# Find an element by its index or position

In [10]:
df.head(5)

Unnamed: 0,Name of Employee,Sales,Quarter,Country
0,Jon,1000,1,US
1,Mark,300,1,Japan
2,Tina,400,1,Brazil
3,Maria,500,1,UK
4,Bill,1000,1,Brazil


In [21]:
print(df.loc[1],'\n\n') # by label / index

print(df.iloc[1]) # by position

Name of Employee     Mark
Sales                 300
Quarter                 1
Country             Japan
Name: 1, dtype: object 


Name of Employee     Mark
Sales                 300
Quarter                 1
Country             Japan
Name: 1, dtype: object


In the above, the position and index are the same. But, it's not always the case. Here is an example:

In [14]:
df3.head(3)

Unnamed: 0,Name of Employee,Sales,Quarter,Country
2,Tina,400,1,Brazil
3,Maria,500,1,UK
4,Bill,1000,1,Brazil


In [16]:
df3.loc[2]

Name of Employee      Tina
Sales                  400
Quarter                  1
Country             Brazil
Name: 2, dtype: object

In [17]:
df3.iloc[0]

Name of Employee      Tina
Sales                  400
Quarter                  1
Country             Brazil
Name: 2, dtype: object

You may want to show some specific fields of a record:

In [20]:
df3.loc[2,["Country"]]

Country    Brazil
Name: 2, dtype: object

# Create a pivot table from a dataframe

In [38]:
df.columns

Index(['Name of Employee', 'Sales', 'Quarter', 'Country'], dtype='object')

In [39]:
pv = df.pivot_table(index=['Name of Employee'], aggfunc='sum')

In [40]:
pv

Unnamed: 0_level_0,Quarter,Sales
Name of Employee,Unnamed: 1_level_1,Unnamed: 2_level_1
Bill,10,1410
Jon,8,3000
Maria,10,1500
Mark,10,2600
Tina,10,2100


Note that the "Country" could not be included when aggregate function is "sum". Because it is a categorical feature.

In this dataframe, index is defined on "Name of Employee". So, we can search based on names.

In [43]:
pv.loc["Bill", ["Sales"]]

Sales    1410
Name: Bill, dtype: int64