### Pandas Exercise Solutions

### 1. Stock prices

The file *apple_share_price.csv* contains share prices of Apple's stock over a period of two years. We have loaded the data and used the 'Date' column as an index. 

Write code to do the following:

* List the closing prices for days between Jan 01, 2017 and Jan 15, 2017. Note that the prices are in reverse chronological order.

* Find the mean closing price during the above period.


In [1]:
import pandas as pd
dfp = pd.read_csv("misc/apple_share_price.csv",parse_dates={'date':[0]},
                  index_col="date")
dfp.head(3)

#Write a line that retrieves the share prices between Jan 01, 2017 to Jan 15, 2017.

dfp.loc['2017-01-15':'2017-01-01']


Unnamed: 0_level_0,Open,High,Low,Close,Volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-01-13,119.11,119.62,118.81,119.04,26111948
2017-01-12,118.9,119.3,118.21,119.25,27086220
2017-01-11,118.74,119.93,118.6,119.75,27588593
2017-01-10,118.77,119.38,118.3,119.11,24462051
2017-01-09,117.95,119.43,117.94,118.99,33561948
2017-01-06,116.78,118.16,116.47,117.91,31751900
2017-01-05,115.92,116.86,115.81,116.61,22193587
2017-01-04,115.85,116.51,115.75,116.02,21118116
2017-01-03,115.8,116.33,114.76,116.15,28781865


In [2]:
# Mean price during the above period
dfp.loc['2017-01-15':'2017-01-01']['Close'].mean()

118.09222222222222

### 2. Day or Night?

In this problem, you need to solve a classification problem. There are 10 pictures in the folder <code>misc/day_night_pics</code>. The first five were taking during day time and the rest during night time. Each image is of size 100x100. 

The file [day_night_pics.csv](./misc/day_night_pics.csv) has 17 columns. The first ten columns encode the ten images mentioned above. For each pixel, there are three values: (R,G,B). So an image is described by 3x100x100 values. The first three rows contain the R,G,B values of the top-left most pixel. The next three rows contain the RGB values of the next pixel and so on. Hence, there are 30000 rows. 

Columns 10-16 has data for seven more images (which you cant see). Which of these images do you think were taken during the day?

In [3]:
import pandas as pd
df = pd.read_csv('misc/day_night_pics.csv')

In [4]:
df.shape

(30000, 17)

In [5]:
# Write your answer and reasoning here
# Daytime pictures being brighter than night-time pictures will have higher pixel values .
# Looking at the mean pixel intensity, we can guess that pics 10, 13 and 14 were taken during day time.
df.mean()

0     173.326800
1     114.492467
2     178.534900
3     140.377000
4     188.585367
5      71.076800
6      31.955933
7      24.146100
8      35.944133
9      44.112767
10    117.929033
11     34.212333
12     71.185433
13    112.310900
14    137.300200
15     60.663767
dtype: float64

### 3. Student Marks 

Consider the student dataset used in the example workbook. Write python code to do the following:

* Create a third column called "Average" which contains the average of the Marks1 and Marks2.
* Rank the students according to their average marks.
* Find the students whose Marks1 is above 30 and Marks2 is below 25.
* Capitalize the names of the students (use apply function)
* Create a new column called 'Sex' and set it to 'F' if the student's name ends with the letter 'a' and 'M' otherwise.

In [6]:
import pandas as pd

df = pd.read_csv('misc/studentmarks2.csv', sep=",", header=None)
df.columns = ['Name', 'Marks1', 'Marks2']

In [7]:
# 1. Avg. marks
df['Average'] = df[['Marks1','Marks2']].mean(axis="columns")  
# axis=columns should be interpreted as `average over columns`. axis=1 does the same thing.


In [8]:
# 2. Ranking
df.sort_values(by=['Average'], ascending=False)

Unnamed: 0,Name,Marks1,Marks2,Average
9,vikram,45,30,37.5
1,sandesh,20,45,32.5
3,ranjan,40,25,32.5
2,adil,30,30,30.0
10,asha,30,30,30.0
7,aryan,37,20,28.5
8,soumya,40,15,27.5
0,priya,25,25,25.0
5,james,15,34,24.5
6,himanshu,20,20,20.0


In [9]:
# 3. Selection criteria

criteria1 = df['Marks1'] > 30
criteria2 = df['Marks2'] < 25
df[ criteria1 & criteria2 ]

Unnamed: 0,Name,Marks1,Marks2,Average
7,aryan,37,20,28.5
8,soumya,40,15,27.5


In [10]:
# 4. Capitalize the names
def Captitalize(n):
    return n.capitalize()
df['Name'] = df['Name'].apply(Captitalize)
df

Unnamed: 0,Name,Marks1,Marks2,Average
0,Priya,25,25,25.0
1,Sandesh,20,45,32.5
2,Adil,30,30,30.0
3,Ranjan,40,25,32.5
4,Shubha,20,15,17.5
5,James,15,34,24.5
6,Himanshu,20,20,20.0
7,Aryan,37,20,28.5
8,Soumya,40,15,27.5
9,Vikram,45,30,37.5


In [11]:
# 5. Add the sex column
def add_gender(x):
    return "F" if x['Name'].endswith('a') else "M"

df['Sex'] = df.apply(add_gender,axis=1)
df

Unnamed: 0,Name,Marks1,Marks2,Average,Sex
0,Priya,25,25,25.0,F
1,Sandesh,20,45,32.5,M
2,Adil,30,30,30.0,M
3,Ranjan,40,25,32.5,M
4,Shubha,20,15,17.5,F
5,James,15,34,24.5,M
6,Himanshu,20,20,20.0,M
7,Aryan,37,20,28.5,M
8,Soumya,40,15,27.5,F
9,Vikram,45,30,37.5,M


### 4. Balance calculator

The file [transactions.csv](misc/transaction.csv) contains records of transactions between a group of seven people. A row has three fields: From, To and Amount. An entry "Asha, Kabir, 20" means that Asha gave Rs.20 to Kabir. Write a program to find out the net balances of all seven individuals after the transactions. It is possible for a person to have negative balance. Assume that everybody has Rs. 0 to begin with.

*Hint: Use the melt function*


In [12]:
import pandas as pd
df = pd.read_csv('misc/transactions.csv')

In [13]:
df.head()

Unnamed: 0,From,To,Amount
0,Asha,Kabir,20
1,Sumanth,Asha,10
2,Sumanth,Kabir,5
3,Kabir,Rahul,5
4,Rahul,Asha,25


In [14]:
# Code your solution here
df_m = pd.melt(df, id_vars=['Amount'], value_vars=['From', 'To'],
               var_name=['Side'], value_name='Name' )

In [15]:
def negate(x):
    if x['Side']=='From':
        return -1*x['Amount']
    else:
        return x['Amount']
    
df_m['Amount'] = df_m.apply(negate,axis=1)
df_m

Unnamed: 0,Amount,Side,Name
0,-20,From,Asha
1,-10,From,Sumanth
2,-5,From,Sumanth
3,-5,From,Kabir
4,-25,From,Rahul
5,-30,From,Vinod
6,-15,From,Asha
7,-15,From,Vinod
8,-35,From,Kabir
9,-25,From,Shyam


In [16]:
df_m.groupby('Name').sum()

Unnamed: 0_level_0,Amount
Name,Unnamed: 1_level_1
Asha,25
Kabir,-15
Kamal,5
Rahul,10
Shyam,-30
Sumanth,0
Vinod,5
