### Welcome to the Answer notebook for Module 1 ! These notebooks have been provided to code and solve all the queries asked in the modules.
This environment has all the necessary libraries pre-installed, and all the Stock, Commodities and Index data files available at the following location -
#——————————————————————————————————————————————————————————————————–#
Import the csv file of the stock of your choosing using ‘pd.read_csv()’ function into a dataframe. Shares of a company can be offered in more than one category. The category of a stock is indicated in the ‘Series’ column. If the csv file has data on more than one category, the ‘Date’ column will have repeating values. To avoid repetitions in the date, remove all the rows where ‘Series’ column is NOT ‘EQ’. Analyze and understand each column properly. You’d find the head(), tail() and describe() functions to be immensely useful for exploration. You’re free to carry out any other exploration of your own.
#The solution code should start right after the query statement, for example -
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv('PNB.csv')
print(df.head())
print(df.tail())
df.describe()
Symbol Series Date Prev Close Open Price High Price Low Price 0 PNB EQ 15-May-2017 169.15 169.95 171.30 166.2 1 PNB EQ 16-May-2017 166.80 167.35 178.00 164.8 2 PNB EQ 17-May-2017 174.15 173.00 173.10 164.3 3 PNB EQ 18-May-2017 165.05 163.40 164.40 157.9 4 PNB EQ 19-May-2017 158.50 159.50 162.25 154.1 Last Price Close Price Average Price Total Traded Quantity 0 166.85 166.80 168.94 7476618 1 173.20 174.15 173.13 51532461 2 164.80 165.05 167.27 26536944 3 158.25 158.50 161.06 13905573 4 155.20 155.20 157.59 16311602 Turnover No. of Trades Deliverable Qty % Dly Qt to Traded Qty 0 1.263123e+09 41641 1352589 18.09 1 8.921904e+09 254027 5804867 11.26 2 4.438942e+09 116512 7037498 26.52 3 2.239613e+09 76746 3245443 23.34 4 2.570540e+09 82394 2816963 17.27 Symbol Series Date Prev Close Open Price High Price Low Price 489 PNB EQ 07-May-2019 86.25 86.70 87.15 83.40 490 PNB EQ 08-May-2019 83.95 83.75 85.40 83.25 491 PNB EQ 09-May-2019 83.95 84.10 85.35 82.80 492 PNB EQ 10-May-2019 84.25 85.00 87.25 84.40 493 PNB EQ 13-May-2019 86.25 86.15 86.20 79.05 Last Price Close Price Average Price Total Traded Quantity 489 84.05 83.95 85.62 16356567 490 84.05 83.95 84.43 23380690 491 84.50 84.25 84.30 22934344 492 86.00 86.25 85.53 21821500 493 79.90 79.80 82.13 34293332 Turnover No. of Trades Deliverable Qty % Dly Qt to Traded Qty 489 1.400418e+09 37679 2333933 14.27 490 1.974095e+09 52244 2674859 11.44 491 1.933317e+09 44704 2025433 8.83 492 1.866353e+09 44879 1950720 8.94 493 2.816529e+09 88162 7102109 20.71
Prev Close | Open Price | High Price | Low Price | Last Price | Close Price | Average Price | Total Traded Quantity | Turnover | No. of Trades | Deliverable Qty | % Dly Qt to Traded Qty | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 494.000000 | 494.000000 | 494.000000 | 494.000000 | 494.000000 | 494.00000 | 494.000000 | 4.940000e+02 | 4.940000e+02 | 494.000000 | 4.940000e+02 | 494.000000 |
mean | 112.380870 | 112.576822 | 114.621255 | 110.243623 | 112.187551 | 112.20000 | 112.462085 | 2.515651e+07 | 2.671164e+09 | 74657.987854 | 4.169411e+06 | 17.631721 |
std | 40.026359 | 40.144154 | 41.033817 | 39.184032 | 40.033928 | 39.97115 | 40.067790 | 2.501765e+07 | 3.160705e+09 | 73820.770037 | 4.510557e+06 | 7.346969 |
min | 59.700000 | 60.000000 | 63.400000 | 58.450000 | 59.750000 | 59.70000 | 60.730000 | 1.148287e+06 | 1.492829e+08 | 6684.000000 | 3.466790e+05 | 5.180000 |
25% | 79.300000 | 79.512500 | 80.850000 | 78.112500 | 79.275000 | 79.30000 | 79.517500 | 1.187260e+07 | 1.275810e+09 | 39754.250000 | 1.912565e+06 | 12.650000 |
50% | 93.400000 | 93.775000 | 94.975000 | 92.350000 | 93.325000 | 93.37500 | 93.625000 | 1.896336e+07 | 1.854954e+09 | 54358.000000 | 2.932769e+06 | 16.145000 |
75% | 147.450000 | 148.400000 | 150.250000 | 144.937500 | 147.137500 | 147.40000 | 147.970000 | 2.825566e+07 | 2.818940e+09 | 80766.500000 | 4.841530e+06 | 20.817500 |
max | 213.600000 | 214.000000 | 231.450000 | 201.100000 | 218.000000 | 213.60000 | 213.210000 | 2.107817e+08 | 3.137858e+10 | 626019.000000 | 5.128026e+07 | 53.840000 |
Calculate the maximum, minimum and mean price for the last 90 days. (price=Closing Price unless stated otherwise)
data=np.array(df['Close Price'])
data = data[-90:]
print(data)
print(np.max(data))
print(np.min(data))
print(np.mean(data))
[78.1 79.8 77.85 77.85 81.3 80.55 81.7 80.7 81.25 80.65 82.5 81.95 84.25 84.55 82.85 80.25 78.6 77.75 78.1 76.6 76.1 77.3 77.25 77.5 73.9 73.15 73.4 75.2 75.65 71.2 70.85 71.35 69.45 72.25 70.05 69. 69.95 71.1 73.15 73.2 73.5 71.85 71.5 72.3 76.65 82.65 82.4 85.45 83.8 85.85 85.4 84.85 84.45 86.05 86.65 90.5 93.55 91.55 90.3 92.5 93.4 95.35 95.5 97.5 98.3 94.45 93.65 93.1 93.1 94.35 93.35 92.7 93.35 93.85 92.8 90.05 86.65 87.2 88.35 86.75 88.15 84.9 86.1 87.15 86.25 83.95 83.95 84.25 86.25 79.8 ] 98.3 69.0 82.71555555555557
Analyse the data types for each column of the dataframe. Pandas knows how to deal with dates in an intelligent manner. But to make use of Pandas functionality for dates, you need to ensure that the column is of type ‘datetime64(ns)’. Change the date column from ‘object’ type to ‘datetime64(ns)’ for future convenience. See what happens if you subtract the minimum value of the date column from the maximum value.
print(df.dtypes)
df['Date']=pd.to_datetime(df.Date)
print("\n\nChanged the date column from 'object' type to 'datetime64(ns)''")
print(df.dtypes)
Symbol object Series object Date object Prev Close float64 Open Price float64 High Price float64 Low Price float64 Last Price float64 Close Price float64 Average Price float64 Total Traded Quantity int64 Turnover float64 No. of Trades int64 Deliverable Qty int64 % Dly Qt to Traded Qty float64 dtype: object Changed the date column from 'object' type to 'datetime64(ns)'' Symbol object Series object Date datetime64[ns] Prev Close float64 Open Price float64 High Price float64 Low Price float64 Last Price float64 Close Price float64 Average Price float64 Total Traded Quantity int64 Turnover float64 No. of Trades int64 Deliverable Qty int64 % Dly Qt to Traded Qty float64 dtype: object
In a separate array , calculate the monthwise VWAP (Volume Weighted Average Price ) of the stock. ( VWAP = sum(price*volume)/sum(volume) ) To know more about VWAP , visit - VWAP definition {Hint : Create a new dataframe column ‘Month’. The values for this column can be derived from the ‘Date” column by using appropriate pandas functions. Similarly, create a column ‘Year’ and initialize it. Then use the ‘groupby()’ function by month and year. Finally, calculate the vwap value for each month (i.e. for each group created).
df['Month']=pd.Series([i.month_name() for i in df['Date']])
df['Year']=pd.Series([i.year for i in df['Date']])
new = df
new['price*volume']=new['Close Price']*new['Total Traded Quantity']
new=new.loc[:, ['Month', 'Year','price*volume', 'Total Traded Quantity']].groupby(['Month', 'Year'])
new=new['price*volume', 'Total Traded Quantity'].agg(np.sum)
new['VWAP']=new['price*volume']/new['Total Traded Quantity']
print(new.loc[:, ['VWAP']])
VWAP Month Year April 2018 98.452684 2019 92.630680 August 2017 147.583982 2018 84.404688 December 2017 173.158001 2018 73.083889 February 2018 119.843376 2019 72.407841 January 2018 177.425422 2019 80.117278 July 2017 152.844298 2018 79.282495 June 2017 146.144594 2018 84.127327 March 2018 97.928249 2019 88.435446 May 2017 158.529155 2018 82.944949 2019 84.371657 November 2017 193.098141 2018 70.891962 October 2017 193.005039 2018 66.537495 September 2017 139.951927 2018 74.282560
Write a function to calculate the average price over the last N days of the stock price data where N is a user defined parameter. Write a second function to calculate the profit/loss percentage over the last N days. Calculate the average price AND the profit/loss percentages over the course of last - 1 week, 2 weeks, 1 month, 3 months, 6 months and 1 year. {Note : Profit/Loss percentage between N days is the percentage change between the closing prices of the 2 days }
def avg_price_lastNdays(df, N):
X=np.array(df['Close Price'])
return np.average(X[-N:])#.agg(np.mean)
def profit_loss_Ndays(df, N):
X=np.array(df['Close Price'])
return np.absolute(X[-1]-X[-N])*100/X[-N]
index = df.columns.get_loc("Close Price")
print("Profit/Loss percentage over 1 week = % 0.2f%%" %(profit_loss_Ndays(df, 7)))
print("Profit/Loss percentage over 2 weeks = % 0.2f%%" %(profit_loss_Ndays(df, 14)))
print("Profit/Loss percentage over 1 month = % 0.2f%%" %(profit_loss_Ndays(df, 30)))
print("Profit/Loss percentage over 3 months = % 0.2f%%" %(profit_loss_Ndays(df, 90)))
print("Profit/Loss percentage over 6 months = % 0.2f%%" %(profit_loss_Ndays(df, 180)))
print("Profit/Loss percentage over 1 Year = % 0.2f%%" %(profit_loss_Ndays(df, 365)))
print("Average Price over 1 week = % 0.2f" %(avg_price_lastNdays(df, 7)))
print("Average Price over 2 weeks = % 0.2f" %(avg_price_lastNdays(df, 14)))
print("Average Price over 1 month = % 0.2f" %(avg_price_lastNdays(df, 30)))
print("Average Price over 3 months = % 0.2f" %(avg_price_lastNdays(df, 90)))
print("Average Price over 6 months = % 0.2f" %(avg_price_lastNdays(df, 180)))
print("Average Price over 1 Year = % 0.2f" %(avg_price_lastNdays(df, 365)))
Profit/Loss percentage over 1 week = 8.43% Profit/Loss percentage over 2 weeks = 7.91% Profit/Loss percentage over 1 month = 14.56% Profit/Loss percentage over 3 months = 2.18% Profit/Loss percentage over 6 months = 0.13% Profit/Loss percentage over 1 Year = 58.18% Average Price over 1 week = 84.51 Average Price over 2 weeks = 85.69 Average Price over 1 month = 90.15 Average Price over 3 months = 82.72 Average Price over 6 months = 77.99 Average Price over 1 Year = 98.09
Add a column ‘Day_Perc_Change’ where the values are the daily change in percentages i.e. the percentage change between 2 consecutive day’s closing prices. Instead of using the basic mathematical formula for computing the same, use ‘pct_change()’ function provided by Pandas for dataframes. You will note that the first entry of the column will have a ‘Nan’ value. Why does this happen? Either remove the first row, or set the entry to 0 before proceeding.
df=df.drop(columns=['price*volume'])
Z = df['Close Price'].pct_change()
Z[0]=0
df['Day_Perc_Change']=Z
df.head()
Symbol | Series | Date | Prev Close | Open Price | High Price | Low Price | Last Price | Close Price | Average Price | Total Traded Quantity | Turnover | No. of Trades | Deliverable Qty | % Dly Qt to Traded Qty | Month | Year | Day_Perc_Change | Trend | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | PNB | EQ | 2017-05-15 | 169.15 | 169.95 | 171.30 | 166.2 | 166.85 | 166.80 | 168.94 | 7476618 | 1.263123e+09 | 41641 | 1352589 | 18.09 | May | 2017 | 0.000000 | Slight or No change |
1 | PNB | EQ | 2017-05-16 | 166.80 | 167.35 | 178.00 | 164.8 | 173.20 | 174.15 | 173.13 | 51532461 | 8.921904e+09 | 254027 | 5804867 | 11.26 | May | 2017 | 0.044065 | Slight or No change |
2 | PNB | EQ | 2017-05-17 | 174.15 | 173.00 | 173.10 | 164.3 | 164.80 | 165.05 | 167.27 | 26536944 | 4.438942e+09 | 116512 | 7037498 | 26.52 | May | 2017 | -0.052254 | Slight or No change |
3 | PNB | EQ | 2017-05-18 | 165.05 | 163.40 | 164.40 | 157.9 | 158.25 | 158.50 | 161.06 | 13905573 | 2.239613e+09 | 76746 | 3245443 | 23.34 | May | 2017 | -0.039685 | Slight or No change |
4 | PNB | EQ | 2017-05-19 | 158.50 | 159.50 | 162.25 | 154.1 | 155.20 | 155.20 | 157.59 | 16311602 | 2.570540e+09 | 82394 | 2816963 | 17.27 | May | 2017 | -0.020820 | Slight or No change |
Add another column ‘Trend’ whose values are: ‘Slight or No change’ for ‘Day_Perc_Change’ in between -0.5 and 0.5 ‘Slight positive’ for ‘Day_Perc_Change’ in between 0.5 and 1 ‘Slight negative’ for ‘Day_Perc_Change’ in between -0.5 and -1 ‘Positive’ for ‘Day_Perc_Change’ in between 1 and 3 ‘Negative’ for ‘Day_Perc_Change’ in between -1 and -3 ‘Among top gainers’ for ‘Day_Perc_Change’ in between 3 and 7 ‘Among top losers’ for ‘Day_Perc_Change’ in between -3 and -7 ‘Bull run’ for ‘Day_Perc_Change’ >7 ‘Bear drop’ for ‘Day_Perc_Change’ <-7
def getTrend(n):
if -0.5<n<=0.5:
return 'Slight or No change'
if 0.5<n<=1:
return 'Slight negative'
if 1<n<=3:
return 'Positive'
if -3<n<=-1:
return 'Negative'
if 3<n<=7:
return 'Among top gainers'
if -7<n<=-3:
return 'Among top losers'
if n>7:
return 'Bull run'
if n<-7:
return 'Bear drop'
df['Trend']=pd.Series([getTrend(i) for i in df['Day_Perc_Change']])
df.head()
Symbol | Series | Date | Prev Close | Open Price | High Price | Low Price | Last Price | Close Price | Average Price | Total Traded Quantity | Turnover | No. of Trades | Deliverable Qty | % Dly Qt to Traded Qty | Month | Year | Day_Perc_Change | Trend | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | PNB | EQ | 2017-05-15 | 169.15 | 169.95 | 171.30 | 166.2 | 166.85 | 166.80 | 168.94 | 7476618 | 1.263123e+09 | 41641 | 1352589 | 18.09 | May | 2017 | 0.000000 | Slight or No change |
1 | PNB | EQ | 2017-05-16 | 166.80 | 167.35 | 178.00 | 164.8 | 173.20 | 174.15 | 173.13 | 51532461 | 8.921904e+09 | 254027 | 5804867 | 11.26 | May | 2017 | 0.044065 | Slight or No change |
2 | PNB | EQ | 2017-05-17 | 174.15 | 173.00 | 173.10 | 164.3 | 164.80 | 165.05 | 167.27 | 26536944 | 4.438942e+09 | 116512 | 7037498 | 26.52 | May | 2017 | -0.052254 | Slight or No change |
3 | PNB | EQ | 2017-05-18 | 165.05 | 163.40 | 164.40 | 157.9 | 158.25 | 158.50 | 161.06 | 13905573 | 2.239613e+09 | 76746 | 3245443 | 23.34 | May | 2017 | -0.039685 | Slight or No change |
4 | PNB | EQ | 2017-05-19 | 158.50 | 159.50 | 162.25 | 154.1 | 155.20 | 155.20 | 157.59 | 16311602 | 2.570540e+09 | 82394 | 2816963 | 17.27 | May | 2017 | -0.020820 | Slight or No change |
Find the average and median values of the column ‘Total Traded Quantity’ for each of the types of ‘Trend’. {Hint : use ‘groupby()’ on the ‘Trend’ column and then calculate the average and median values of the column ‘Total Traded Quantity’}
print('Group wise Medians:')
print(df.loc[:,['Trend', 'Total Traded Quantity']].groupby('Trend').agg(np.median))
print('\n\nGroup wise Means:')
print(df.loc[:,['Trend', 'Total Traded Quantity']].groupby('Trend').agg(np.mean))
Group wise Medians: Total Traded Quantity Trend Slight or No change 18963362 Group wise Means: Total Traded Quantity Trend Slight or No change 2.515651e+07
SAVE the dataframe with the additional columns computed as a csv file week2.csv. In Module 2, you are going to get familiar with matplotlib, the python module which is used to visualize data.
df.to_csv('week2.csv')