# Industry Stock Prices

> Comparing average returns on stocks across industries.

In this assignment, we conduct a descriptive analysis of the returns on stocks across industries of S&P 500 companies. To conduct the analysis, we used a dataset from kaggle.com containing information on the S&P 500 companies and their industries. The dataset also contains sub-industries of the companies but we will focus on the main industries (e.g. Health Care, Information Technology, etc.). Additionally, we downloaded data for the stock prices in the year 2023 using Yahoo finance. 

Yfinance has to be installed to run this notebook.  

Imports and set magics:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact
import seaborn as sns
import plotly.graph_objects as go
from datetime import datetime

# Autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# Install yfinance
#pip install yfinance
import yfinance as yf

# user written modules
import dataproject as dp

plt.rcParams.update({"axes.grid":True,"grid.color":"black","grid.alpha":"0.25","grid.linestyle":"--"})
plt.rcParams.update({'font.size': 14})


# Reading and cleaning data

Import a CSV file that contains S&P 500 companies and their industries.

In [None]:
# Read file, sort values in alphabetical order and reset index
SP500 = (pd.read_csv('sp500-companies.csv', encoding='ISO-8859-1')
         .sort_values(by=['Ticker'],ascending=True)
         .reset_index(drop=True))

# Drop columns we don't need
drop_columns = ['Sub-Industry', 'Headquarters Location', 'Date added', 'Founded']
SP500.drop(drop_columns, axis=1, inplace=True)

# Remove duplicates
SP500.index.duplicated(keep='first')

# Display dataframe
SP500.head()

Create a list of yfinance tickers to pass as input to yfinance.

In [None]:
# Create a list of yfinance tickers 
SP500_tickers = list(SP500['Ticker'])

Download data for 2023 from Yahoo finance and extract adjusted close prices. The adjusted close price is the closing price after adjustments for all applicable splits and dividend distributions.

In [None]:
# Download historical market data for the year 2023
hist_prices = yf.download(tickers = SP500_tickers, start = '2023-01-01',
                        end = '2023-12-31',
                        interval = '1mo')

# Get adjusted close for each stock and change dates
hist_prices = hist_prices['Adj Close']

# Change dateformat
hist_prices.index = pd.to_datetime(hist_prices.index, format='%m-%y')

# Display DataFrame
hist_prices.head()

Clean the data by removing empty columns.

In [None]:
# Remove columns with NaN values
hist_prices_clean = hist_prices.dropna(axis=1)

To 

In [None]:
# Calculate monthly and cumulative returns 
monthly_returns, cumulative_returns = dp.calculate_returns(hist_prices_clean)

# Set the first row of the cumulative returns to 1
cumulative_returns.iloc[0] = 1

# Display DataFrame
cumulative_returns.head()


Create a dictionary of the companies sorted according to their industries.

In [None]:
# Group companies by sector
grouped_companies = {}
for index, row in SP500.iterrows():
    if row['Industry'] in grouped_companies:
        grouped_companies[row['Industry']].append(row['Ticker'])
    else:
        grouped_companies[row['Industry']] = [row['Ticker']]

print(grouped_companies)    

## Exploring the dataset

To explore the data, we first visualise the cumulative returns of each stock.

**Interactive plot** :

In [None]:
industries = grouped_companies.keys()
dropdown = widgets.Dropdown(options = industries, description='Industry:')

def plot_cumulative_returns(cumulative_returns):
    fig = go.Figure()
    for column in cumulative_returns.columns:
        fig.add_trace(go.Scatter(x=cumulative_returns.index, y=cumulative_returns[column], mode='lines', name=column))

    fig.update_layout(title='Cumulative Returns of S&P 500 Companies',
                      xaxis_title='Date',
                      yaxis_title='Cumulative Returns',
                      hovermode='x unified')
    fig.show()

widgets.interact(plot_cumulative_returns, cumulative_returns = widgets.fixed(cumulative_returns), stock = SP500_tickers,
                # Set reference
                ref = widgets.fixed('^OMXC25'),
                # Set figure no.
                fig = widgets.fixed(1),
                # ax_data are only for non-interactive plots
                ax_data=widgets.fixed(None))

Next, we can visually compare the cumulative returns of 2 different companies.

In [None]:
industries = grouped_companies.keys()
dropdown = widgets.Dropdown(options = SP500_tickers, description='Industry:')

# List of available companies (assuming cumulative_returns.columns contains company names)
companies = cumulative_returns.columns.tolist()

# Create multi-select dropdown widget for selecting multiple companies
company_dropdown = widgets.SelectMultiple(options=companies, value=[companies[0]], description='Select Companies')

# Create interactive widget using widgets.interact
widgets.interact(dp.plot_cumulative_returns, cumulative_returns=widgets.fixed(cumulative_returns), selected_companies=company_dropdown)

# Merge data sets

Merge the data on cumulative returns and the data from the SP500 dataframe. 

In [None]:
merged_data = pd.merge(hist_prices_clean.transpose(), SP500, on='Ticker')
merged_data.head()

# Analysis

In [None]:
grouped_returns = merged_data.groupby('Industry')
# print(grouped_returns.head())
# average_returns = grouped_returns['Return'].mean()

print(grouped_returns.describe())


To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

Showing the summary statistics in a graph with chosen sectors

In [None]:

print(grouped_returns.describe())

# Here we chose Energy and Information Technology
summary_stats = grouped_returns.describe().iloc[[3, 7], :] 


plt.figure(figsize=(10, 6))
summary_stats.plot(kind='bar', figsize=(10, 6))
plt.title('Summary Statistics')
plt.ylabel('Values')
plt.xlabel('Statistics')
plt.xticks(rotation=45)
plt.legend(title='Sectors', fontsize='small', loc='upper right', bbox_to_anchor=(1.15, 1))
plt.tight_layout()
plt.show()


We can also compare 2 different companies directly, using an interactive bar chart displaying the mean and standard deviations of the 2 companies. The performance of the companies can be compared using the mean returns, and the standard deviation indicates the volatility of the stocks.

In [None]:
# Calculate returns
monthly_returns, cumulative_returns = dp.calculate_returns(hist_prices_clean)

# Grouped returns (example, assuming 'dataframe' has a MultiIndex with sectors and stocks)
grouped_returns = monthly_returns.groupby(level=0, axis=1).mean()

# Print summary statistics
print(grouped_returns.describe())

# Interactive widgets for selecting industries
industry_options = grouped_returns.columns.tolist()
industry1_widget = widgets.Dropdown(options=industry_options, description='Industry 1:')
industry2_widget = widgets.Dropdown(options=industry_options, description='Industry 2:')

# Function to update plot based on selected industries
def update_plot(industry1, industry2):
    dp.plot_summary_statistics(grouped_returns, industry1, industry2)

# Display interactive widgets and plot
interact(update_plot, industry1=industry1_widget, industry2=industry2_widget)

# Conclusion

By importing and cleaning data we get to analyze the chosen data in a specific way. It's possible for us to chose whatever sector needed and provide graphic and numerical statistics. 