# **8.1.4 Data Analysis and 8.1.5 Supplementary Activity**

1. With the earthquakes.csv file, select all the earthquakes in Japan with a magType of mb and a magnitude of 4.9 or greater.

In [1]:
import pandas as pd
earthquakes = pd.read_csv('/content/earthquakes.csv')

In [2]:
# Selecting earthquakes that occurred in Japan with magnitude type 'mb' and magnitude greater than or equal to 4.9
# Filter earthquakes dataframe to include only those with place containing 'Japan',
# magType equal to 'mb', and mag greater than or equal to 4.9
japan_earthquakes = earthquakes[(earthquakes['place'].str.contains('Japan')) &
                                (earthquakes['magType'] == 'mb') &
                                (earthquakes['mag'] >= 4.9)]
japan_earthquakes

Unnamed: 0,mag,magType,time,place,tsunami,parsed_place
1563,4.9,mb,1538977532250,"293km ESE of Iwo Jima, Japan",0,Japan
2576,5.4,mb,1538697528010,"37km E of Tomakomai, Japan",0,Japan
3072,4.9,mb,1538579732490,"15km ENE of Hasaki, Japan",0,Japan
3632,4.9,mb,1538450871260,"53km ESE of Hitachi, Japan",0,Japan


**The 'japan_earthquakes' dataset is what we get after we've sorted through earthquake data to just focus on ones that happened in Japan, use a specific way of measuring magnitude called 'mb', and are strong, with a magnitude of 4.9 or more. So, it's a list of the big earthquakes in Japan that fit these criteria.**

2. Create bins for each full number of magnitude (for example, the first bin is 0-1, the second is 1-2, and so on) with a magType of ml and count how many are in each bin.

In [3]:
# Filter earthquakes DataFrame to include only those with magnitude type 'ml'
ml_earthquakes = earthquakes[earthquakes['magType'] == 'ml']

# Define bins for magnitude values (from 0 to 10)
bins = [i for i in range(11)]

"""
Count the occurrences of earthquakes within each bin of magnitude values,
using pd.cut() to categorize earthquakes into magnitude bins, and then counting
"""
earthquake_counts = pd.cut(ml_earthquakes['mag'], bins=bins, right=False).value_counts().sort_index()
earthquake_counts

[0, 1)     2072
[1, 2)     3126
[2, 3)      985
[3, 4)      153
[4, 5)        6
[5, 6)        2
[6, 7)        0
[7, 8)        0
[8, 9)        0
[9, 10)       0
Name: mag, dtype: int64

**It provides a frequency distribution of earthquake magnitudes within specified bins.**

3. Using the faang.csv file, group by the ticker and resample to monthly frequency. Make the following aggregations:

Mean of the opening price

Maximum of the high price

Minimum of the low price

Mean of the closing price

Sum of the volume traded

In [4]:
"""
Reading the FAANG data from a CSV file, parsing the 'date' column as dates,
and setting the 'date' column as the index of the DataFrame
"""
faang_data = pd.read_csv('/content/faang.csv', parse_dates=['date'], index_col='date')
# Grouping the FAANG data by ticker and resampling it on a monthly basis
monthly_faang = faang_data.groupby('ticker').resample('M')

# Defining the aggregation functions for the resampled data
aggregations = {
    'open': 'mean',    # Calculating the mean of opening prices
    'high': 'max',     # Finding the maximum high price
    'low': 'min',      # Finding the minimum low price
    'close': 'mean',   # Calculating the mean of closing prices
    'volume': 'sum'    # Summing up the volume traded
}

monthly_agg = monthly_faang.agg(aggregations) # Applying the aggregation functions to the resampled data
monthly_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAPL,2018-01-31,170.71469,176.6782,161.5708,170.699271,659679440
AAPL,2018-02-28,164.562753,177.9059,147.9865,164.921884,927894473
AAPL,2018-03-31,172.421381,180.7477,162.466,171.878919,713727447
AAPL,2018-04-30,167.332895,176.2526,158.2207,167.286924,666360147
AAPL,2018-05-31,182.635582,187.9311,162.7911,183.207418,620976206
AAPL,2018-06-30,186.605843,192.0247,178.7056,186.508652,527624365
AAPL,2018-07-31,188.065786,193.765,181.3655,188.179724,393843881
AAPL,2018-08-31,210.460287,227.1001,195.0999,211.477743,700318837
AAPL,2018-09-30,220.611742,227.8939,213.6351,220.356353,678972040
AAPL,2018-10-31,219.489426,231.6645,204.4963,219.137822,789748068


**It reads monthly stock data from a CSV file (faang), aggregates it by ticker, and calculates monthly averages, maximums, minimums, and sums for open, high, low, close prices, and volume traded.**

4. Build a crosstab with the earthquake data between the tsunami column and the magType column. Rather than showing the frequency count, show the maximum
magnitude that was observed for each combination. Put the magType along the columns.

In [5]:
# Defining a function to calculate the maximum magnitude in a given array of values
def max_magnitude(values):
    return values.max()

"""
'tsunami' and 'magType' are assumed to be columns in the 'earthquakes'
    DataFrame representing tsunami occurrence and magnitude type respectively
'mag' is assumed to be a column in the 'earthquakes' DataFrame representing earthquake magnitude
'aggfunc=max_magnitude' specifies that the maximum magnitude function should be applied to aggregate the data
"""
ctmax_magnitude = pd.crosstab(earthquakes['tsunami'],
                              earthquakes['magType'],
                              values=earthquakes['mag'],
                              aggfunc=max_magnitude)

# Displaying the resulting crosstab
ctmax_magnitude
monthly_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAPL,2018-01-31,170.71469,176.6782,161.5708,170.699271,659679440
AAPL,2018-02-28,164.562753,177.9059,147.9865,164.921884,927894473
AAPL,2018-03-31,172.421381,180.7477,162.466,171.878919,713727447
AAPL,2018-04-30,167.332895,176.2526,158.2207,167.286924,666360147
AAPL,2018-05-31,182.635582,187.9311,162.7911,183.207418,620976206
AAPL,2018-06-30,186.605843,192.0247,178.7056,186.508652,527624365
AAPL,2018-07-31,188.065786,193.765,181.3655,188.179724,393843881
AAPL,2018-08-31,210.460287,227.1001,195.0999,211.477743,700318837
AAPL,2018-09-30,220.611742,227.8939,213.6351,220.356353,678972040
AAPL,2018-10-31,219.489426,231.6645,204.4963,219.137822,789748068


**This calculates the maximum earthquake magnitude for each of tsunami occurrence and magnitude type, using crosstab. The resulting DataFrame provides insights into the relationship between earthquake magnitudes, tsunami occurrences, and magnitude measurement types.**

5. Calculate the rolling 60-day aggregations of OHLC data by ticker for the FAANG data. Use the same aggregations as exercise no. 3.

In [6]:
"""
Grouping the data by ticker and applying a rolling window of 60 days
The '60D' parameter specifies a rolling window of 60 days
'ticker' is the column used for grouping
This creates a rolling object with a 60-day window for each group (ticker)
"""
rolling_agg = faang_data.groupby('ticker').rolling('60D').agg(aggregations)
rolling_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAPL,2018-01-02,166.927100,169.0264,166.0442,168.987200,25555934.0
AAPL,2018-01-03,168.089600,171.2337,166.0442,168.972500,55073833.0
AAPL,2018-01-04,168.480367,171.2337,166.0442,169.229200,77508430.0
AAPL,2018-01-05,168.896475,172.0381,166.0442,169.840675,101168448.0
AAPL,2018-01-08,169.324680,172.2736,166.0442,170.080040,121736214.0
...,...,...,...,...,...,...
NFLX,2018-12-24,283.509250,332.0499,233.6800,281.931750,525657894.0
NFLX,2018-12-26,281.844500,332.0499,231.2300,280.777750,520444588.0
NFLX,2018-12-27,281.070488,332.0499,231.2300,280.162805,532679805.0
NFLX,2018-12-28,279.916341,332.0499,231.2300,279.461341,521968250.0


**This applies rolling aggregation to the faang_data DataFrame, grouping by ticker and analyzing data within 60-day rolling windows.**

6. Create a pivot table of the FAANG data that compares the stocks. Put the ticker in the rows and show the averages of the OHLC and volume traded data.

In [7]:
"""
Creating a pivot table
Specifying 'ticker' as the index for rows in the pivot table
and 'mean' as the aggregation function, which will compute the mean value for each column.
"""
pivot_faang = pd.pivot_table(faang_data, index='ticker', aggfunc='mean')
pivot_faang

Unnamed: 0_level_0,close,high,low,open,volume
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAPL,186.986218,188.906858,185.135729,187.038674,34021450.0
AMZN,1641.726175,1662.839801,1619.840398,1644.072669,5649563.0
FB,171.510936,173.615298,169.30311,171.454424,27687980.0
GOOG,1113.225139,1125.777649,1101.001594,1113.554104,1742645.0
NFLX,319.290299,325.224583,313.187273,319.620533,11470300.0


**This generates a pivot table named pivot_faang from a DataFrame, aggregating data based on ticker and calculating the mean values for each ticker.**

7. Calculate the Z-scores for each numeric column of Netflix's data (ticker is NFLX) using apply().

In [8]:
# Selecting data related to the ticker 'NFLX' from the FAANG dataset
faang_nflx = faang_data.loc[faang_data['ticker'] == 'NFLX']

# Normalizing the selected data (open, high, low, close) using z-score normalization
# Z-score normalization is applied column-wise using lambda function
faang_data = faang_nflx[['open',
                         'high',
                         'low',
                         'close']
                        ].apply(lambda x: x.sub(x.mean()).div(x.std()))

# Adding a new column 'ticker' with value 'NFLX' to the normalized data
faang_data['ticker'] = 'NFLX'

# Setting the index of the DataFrame to the 'ticker' column
faang_data = faang_data.set_index('ticker')

# Returning the normalized data for the ticker 'NFLX'
faang_data

Unnamed: 0_level_0,open,high,low,close
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
NFLX,-2.500753,-2.516023,-2.410226,-2.416644
NFLX,-2.380291,-2.423180,-2.285793,-2.335286
NFLX,-2.296272,-2.406077,-2.234616,-2.323429
NFLX,-2.275014,-2.345607,-2.202087,-2.234303
NFLX,-2.218934,-2.295113,-2.143759,-2.192192
...,...,...,...,...
NFLX,-1.571478,-1.518366,-1.627197,-1.745946
NFLX,-1.735063,-1.439978,-1.677339,-1.341402
NFLX,-1.407286,-1.417785,-1.495805,-1.302664
NFLX,-1.248762,-1.289018,-1.297285,-1.292137


**This filters data for Netflix (NFLX) stock, adding a ticker column and setting it as the index.**

8. Add event descriptions:
Create a dataframe with the following three columns: ticker, date, and event. The columns should have the following values:

ticker: 'FB'
date: ['2018-07-25', '2018-03-19', '2018-03-20']

event: ['Disappointing user growth announced after close.', 'Cambridge Analytica story', 'FTC investigation']

Set the index to ['date', 'ticker']

Merge this data with the FAANG data using an outer join

In [9]:
import pandas as pd
faang_data = pd.read_csv('/content/faang.csv')
faang_data['date'] = pd.to_datetime(faang_data['date'])

faang_data.set_index('date')

Unnamed: 0_level_0,ticker,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-01-02,FB,177.68,181.58,177.5500,181.42,18151903
2018-01-03,FB,181.88,184.78,181.3300,184.67,16886563
2018-01-04,FB,184.90,186.21,184.0996,184.33,13880896
2018-01-05,FB,185.59,186.90,184.9300,186.85,13574535
2018-01-08,FB,187.20,188.90,186.3300,188.28,17994726
...,...,...,...,...,...,...
2018-12-24,GOOG,973.90,1003.54,970.1100,976.22,1590328
2018-12-26,GOOG,989.01,1040.00,983.0000,1039.46,2373270
2018-12-27,GOOG,1017.15,1043.89,997.0000,1043.88,2109777
2018-12-28,GOOG,1049.62,1055.56,1033.1000,1037.08,1413772


In [10]:
# Filter Facebook events data
fb_events = faang_data.loc[(faang_data['ticker'] == 'FB') & (faang_data['date'].isin(['2018-07-25', '2018-03-19', '2018-03-20']))]

# Create DataFrame to store Facebook events
fb_events_df = pd.DataFrame(columns=['date', 'ticker', 'event'])

# Populate DataFrame with relevant data
fb_events_df['date'] = fb_events['date']
fb_events_df['ticker'] = fb_events['ticker']

# Add event descriptions based on dates
fb_events_df.loc[faang_data['date'] == '2018-03-19', 'event'] = 'Disappointing user growth announced after close.'
fb_events_df.loc[faang_data['date'] == '2018-03-20', 'event'] = 'Cambridge Analytica story'
fb_events_df.loc[faang_data['date'] == '2018-07-25', 'event'] = 'FTC investigation'

# Merge Facebook events DataFrame with original FAANG data
faang_merged = pd.merge(fb_events_df, faang_data, on=['ticker', 'date'], how='outer')
faang_merged

Unnamed: 0,date,ticker,event,open,high,low,close,volume
0,2018-03-19,FB,Disappointing user growth announced after close.,177.010,177.17,170.06,172.56,88140060
1,2018-03-20,FB,Cambridge Analytica story,167.470,170.20,161.95,168.15,129851768
2,2018-07-25,FB,FTC investigation,215.715,218.62,214.27,217.50,64592585
3,2018-01-02,FB,,177.680,181.58,177.55,181.42,18151903
4,2018-01-03,FB,,181.880,184.78,181.33,184.67,16886563
...,...,...,...,...,...,...,...,...
1250,2018-12-24,GOOG,,973.900,1003.54,970.11,976.22,1590328
1251,2018-12-26,GOOG,,989.010,1040.00,983.00,1039.46,2373270
1252,2018-12-27,GOOG,,1017.150,1043.89,997.00,1043.88,2109777
1253,2018-12-28,GOOG,,1049.620,1055.56,1033.10,1037.08,1413772


**This aims to create a DataFrame listing events related to Facebook ('FB') and update columns with data from another DataFrame (faang_data).**

9. Use the transform() method on the FAANG data to represent all the values in terms of the first date in the data. To do so, divide all the values for each ticker by the values
for the first date in the data for that ticker. This is referred to as an index, and the data for the first date is the base (https://ec.europa.eu/eurostat/statistics-explained/
index.php/ Beginners:Statisticalconcept-Indexandbaseyear). When data is in this format, we can easily see growth over time. Hint: transform() can take a function name.

In [11]:
# grouping the data by the 'ticker' column and then applying a transformation.
faang_trans = faang_data.groupby('ticker').transform(lambda x : x / x.iloc[0])
faang_trans

  faang_trans = faang_data.groupby('ticker').transform(lambda x : x / x.iloc[0])


Unnamed: 0,open,high,low,close,volume
0,1.000000,1.000000,1.000000,1.000000,1.000000
1,1.023638,1.017623,1.021290,1.017914,0.930292
2,1.040635,1.025498,1.036889,1.016040,0.764707
3,1.044518,1.029298,1.041566,1.029931,0.747830
4,1.053579,1.040313,1.049451,1.037813,0.991341
...,...,...,...,...,...
1250,0.928993,0.940578,0.928131,0.916638,1.285047
1251,0.943406,0.974750,0.940463,0.976019,1.917695
1252,0.970248,0.978396,0.953857,0.980169,1.704782
1253,1.001221,0.989334,0.988395,0.973784,1.142383


**This enables a comparative analysis of the performance of FAANG stocks relative to their starting values.**