# Pandas Exercise
This exercise deals with the Pandas library.

**Remark**: Add a new cell above the expected output. Otherwise you would overwrite the expected result!

In [1]:
# Import packages
import pandas as pd
import numpy as np

## 1. Pandas Series and DataFrames Basics

### **Pandas Series** 

Create a **numpy array** from integers 1 to 5 called **data** and another one containing the characters A to E **called indices**. 

In [2]:
data = <FILL-IN>
indices = <FILL-IN>

Use the Pandas method **pd.Series(data,indices)** to create a Pandas Series called **series**.

In [4]:
series = <FILL-IN>
series

A    1
B    2
C    3
D    4
E    5
dtype: int64

Access the third element of the series by using the **named index** and by using the **location index**.

3

**Square** each element of the series.

A     1
B     4
C     9
D    16
E    25
dtype: int64

Create another series called **seriesFromDict** using a **python dictionary**. The **keys** of the dictionary should  be characters from **C to F** and all **values are 0.5**. Then use again **pd.Series(myDict)**.

C    0.5
D    0.5
E    0.5
F    0.5
dtype: float64

**Add** both Pandas Series **series** and **seriesFromDict**.

A    NaN
B    NaN
C    3.5
D    4.5
E    5.5
F    NaN
dtype: float64

### **Pandas DataFrames**

A Pandas DataFrame is a collection of Pandas Series. Each column and each row represents a Pandas Series. A DataFrame has named indices and named columns.

Use the following three numpy arrays to **construct a DataFrame using the function pd.DataFrame()**. Name the columns **'age'**, **'country'** and **'salary'**.  
Please use the *dictionary-method* to construct the DataFrame and call it df. Afterwards, show the first 10 records (rows) of the DataFrame using the **head(n)**-method.

In [8]:
# random seed
np.random.seed(0)

# three numpy arrays
country = np.random.choice(['USA', 'GER'], size=30)
age = np.random.randint(18,75,size=30)
salary = age * 4000 + 2000 * np.random.randn()

In [13]:
df = pd.DataFrame({'age':<FILL-IN>, <FILL-IN> : country, <FILL-IN>:<FILL-IN>})
<FILL-IN>

Unnamed: 0,age,country,salary
0,27,USA,112539.509248
1,38,GER,156539.509248
2,69,GER,280539.509248
3,34,USA,140539.509248
4,69,GER,280539.509248
5,23,GER,96539.509248
6,33,GER,136539.509248
7,65,GER,264539.509248
8,18,GER,76539.509248
9,36,GER,148539.509248


What are the **data types** of the columns and how much **memory** does the DataFrame use?

Compute the **mean, median, min and max** values of the column **age** and **salary**.

**Hint:** There is one easy method to get all results at the same time.

#### Conditional Indexing:

Please answer the following questions using conditional indexing. Keep in mind to use paranthesis for each condition and combine condtions with the & or | symbol.

1. How many people are older than 20?
2. How many people from Germany are older than 20?
3. How many people earn more than 200 000 or are older than 30?


#### Basic Aggregations and Operations on DataFrames

Compute the **mean and standard deviation** of the age column **grouped by the country**.

In [None]:
df_agg = df.groupby(<FILL-IN>).agg({<FILL-IN>: [np.mean, <FILL-IN>]})

Unnamed: 0_level_0,age,age
Unnamed: 0_level_1,mean,std
country,Unnamed: 1_level_2,Unnamed: 2_level_2
GER,46.647059,17.620802
USA,41.846154,15.175892


**Access the the mean of the age column** for the country GER.

**Hint:** For multi-indices/multi-columns use a **tuple** e.g. df[('outer-col', 'inner-col')].

46.64705882352941

**Apply a function to the dataframe df** which transforms the uppercase country names of the column country to lowercase ones.
Add a new column named **country_lowercase** which holds the lowercase country names.

**Hint:** Use the apply method and inside of the method a lambda function which does sth. like 'HELLO'.lower() --> 'hello'.

In [None]:
df[<FILL-IN>] = df[<FILL-IN>].apply(<FILL-IN>)
df.head()

Unnamed: 0,age,country,salary,country_lowercase
0,27,USA,112539.509248,usa
1,38,GER,156539.509248,ger
2,69,GER,280539.509248,ger
3,34,USA,140539.509248,usa
4,69,GER,280539.509248,ger


Add a new column which holds the salary per year of life for each entry. Name the column **salary_per_yol**.

Unnamed: 0,age,country,salary,salary_per_yol
0,27,USA,112539.509248,4168.129972
1,38,GER,156539.509248,4119.46077
2,69,GER,280539.509248,4065.789989
3,34,USA,140539.509248,4133.514978
4,69,GER,280539.509248,4065.789989


#### Join DataFrames

The two cells below create two DataFrames. Please execute the two cells and go on.

In [84]:
# just execute
df_animals = pd.DataFrame({'Animal': ['Cat', 'Dog', 'Bird', 'Lion', 'Cow', 'Pig'],
                           'Sound': ['miau', 'wuff', 'chirp', 'rawr', 'moo', 'oink']})
df_animals

Unnamed: 0,Animal,Sound
0,Cat,miau
1,Dog,wuff
2,Bird,chirp
3,Lion,rawr
4,Cow,moo
5,Pig,oink


In [85]:
# just execute
df_names = pd.DataFrame({'Animal': ['Cat', 'Cat', 'Bird', 'Dog', 'Pig'],
                         'Name': ['Sylvester', 'Carlo', 'Tweety', 'Balto', 'Babe']})
df_names

Unnamed: 0,Animal,Name
0,Cat,Sylvester
1,Cat,Carlo
2,Bird,Tweety
3,Dog,Balto
4,Pig,Babe


**Merge** the two dataframes by using the column **Animal as the key**. To join DataFrames you can use the method **pd.merge(df1,df2,on=column)**. Please perform an **inner** and an **outer** join.

In [None]:
# inner join
df_inner = <FILL-IN>
df_inner

Unnamed: 0,Animal,Sound,Name
0,Cat,miau,Sylvester
1,Cat,miau,Carlo
2,Dog,wuff,Balto
3,Bird,chirp,Tweety
4,Pig,oink,Babe


In [None]:
# outer join
df_outer = pd.merge(<FILL-IN>, how='outer', on=<FILL-IN>)
df_outer

Unnamed: 0,Animal,Sound,Name
0,Cat,miau,Sylvester
1,Cat,miau,Carlo
2,Dog,wuff,Balto
3,Bird,chirp,Tweety
4,Lion,rawr,
5,Cow,moo,
6,Pig,oink,Babe


**Drop the rows which contain NaN** by using the method **dropna()** on the merged dataframe df_outer.

In [None]:
# drop row
df_outer.<FILL-IN>

Unnamed: 0,Animal,Sound,Name
0,Cat,miau,Sylvester
1,Cat,miau,Carlo
2,Dog,wuff,Balto
3,Bird,chirp,Tweety
6,Pig,oink,Babe


**Drop the column** of df_outer which contains NaN using **dropna()**.

In [None]:
# drop column
df_outer.<FILL-IN>

Unnamed: 0,Animal,Sound
0,Cat,miau
1,Cat,miau
2,Dog,wuff
3,Bird,chirp
4,Lion,rawr
5,Cow,moo
6,Pig,oink


**This is the end of this exercise.**