# S&P 100 Case Study
A chance to apply all the techniques you learned in the course (Introduction to Python for Finance) on the S&P 100 data.

#### Import Libraries and dependencies

In [None]:
# Import matplotlib.pyplot with the alias plt
import matplotlib.pyplot as plt
%matplotlib inline

# Import numpy a np
import numpy as np

# Import pandas with the alias pd
import pandas as pd

# needed to display plots in DataSpell IDE
plt.rcParams['figure.dpi'] = 200

# Introducing the dataset
You're going to use what you've learned about Python to conduct a financial analysis of stocks for the companies in the Standard and Poor's S&P 100. The S&P 100 is a stock market index made up of one hundred major companies in the United States that span multiple industries.
<br>
#### S&P 100 Case Study
Within the S&P 100, companies are associated with specific sectors. For example, the largest sector is made up of companies associated with the consumer discretionary sector. These include companies like Amazon.com and Nike. The next largest sectors are information technology, healthcare, and financial sectors.
<br>
In this case study, we'll be analyzing all the S&P 100 companies as well as sector specific companies.
<br>
#### The data
For each company, we have data on its name, sector, stock price, and earnings per share, abbreviated EPS. The earnings per share is the profit for each share of stock. Our objective for the first part of our case study is to analyze growth expectations of companies within the S&P 100 by calculating the price to earnings ratio of each company.
<br>
#### Price to Earnings Ratio
The price to earnings ratio is used to measure growth expectations of stocks. It is the dollar amount you can expect to invest in a company in order to receive one dollar of the company's earnings. Mathematically, it is the `price per stock share` divided by its `earnings per share`. A higher P/E ratio is generally associated with higher growth expectations.

# Lists
Stocks in the S&P 100 are selected to represent sector balance and market capitalization. To begin, let's take a look at what data we have associated with S&P companies.

In [None]:
# Read data from csv
sp100_csv = pd.read_csv('../data/sp100_data.csv')

In [None]:
# Convert csv data to lists
names = sp100_csv['Name'].tolist()
sectors = sp100_csv['Sector'].tolist()
prices = sp100_csv['Price'].tolist()
earnings = sp100_csv['EPS'].tolist()

# First four items of names
print(names[:4])

# Print information on last company
print(names[-1])
print(prices[-1])
print(earnings[-1])
print(sectors[-1])

# Arrays and NumPy
NumPy is a scientific computing package in Python that helps you to work with arrays. Let's use array operations to calculate price to earning ratios of the S&P 100 stocks.

In [None]:
# Convert lists to arrays
prices_array = np.array(prices)
earnings_array = np.array(earnings)

# Calculate P/E ratio
price_to_earnings_ratio = prices_array / earnings_array
print(price_to_earnings_ratio)

# A closer look at the sectors
Now that you have successfully calculated the price to earnings ratios of all companies within the S&P 100, let's look at sector specific trends. First, we will need to subset sector-specific datasets from the larger S&P dataset. Let's review how we can filter out specific information from a larger array.
 1. Create a boolean filtering array
    - Remember that boolean arrays can be used to manipulate other arrays. To create a boolean array, you can perform a conditional test. This conditional test is performed on each element of the array and a boolean result is returned in an array. This boolean array can then be used for filtering.
 2. Apply filtering array to subset another array
     - Once you have your boolean array, you can use it on another array to select specific elements. In this case study, you will need to use filtering arrays to subset P/E ratios that are associated with specific sectors in the S&P 100.
 3. Summarize P/E ratios
    - Once you subset the P/E ratios for specific sectors, you can use numpy functions to calculate their average and standard deviation.

# Filtering arrays
In this, you will focus on two sectors:
- Information Technology
- Consumer Staples

In [None]:
# Convert lists to numpy arrays
names_array = np.array(names)
sectors_array = np.array(sectors)
price_to_earnings_ratio_array = np.array(price_to_earnings_ratio)

In [None]:
# Create boolean array
boolean_array = (sectors_array == 'Information Technology')

# Subset sector-specific data
info_tech_names = names_array[boolean_array]
info_tech_price_to_earnings_ratio = price_to_earnings_ratio_array[boolean_array]

# Display sector names
print(info_tech_names)
print(info_tech_price_to_earnings_ratio)

In [None]:
# Create boolean array
boolean_array = (sectors_array == 'Consumer Staples')

# Subset sector-specific data
consumer_staples_names = names_array[boolean_array]
consumer_staples_price_to_earnings_ratio = price_to_earnings_ratio_array[boolean_array]

# Display sector names
print(consumer_staples_names)
print(consumer_staples_price_to_earnings_ratio)

# Summarizing sector data
Calculate the mean and standard deviation of P/E ratios for Information Technology and Consumer Staples sectors.

In [None]:
# Calculate mean and standard deviation
info_tech_price_to_earnings_ratio_mean = np.mean(info_tech_price_to_earnings_ratio)
info_tech_price_to_earnings_ratio_std = np.std(info_tech_price_to_earnings_ratio)

print('Information Technology P/E ratio')
print(f'mean: {info_tech_price_to_earnings_ratio_mean}')
print(f'std: {info_tech_price_to_earnings_ratio_std}')

In [None]:
# Calculate mean and standard deviation
consumer_staples_price_to_earnings_ratio_mean = np.mean(consumer_staples_price_to_earnings_ratio)
consumer_staples_price_to_earnings_ratio_std = np.std(consumer_staples_price_to_earnings_ratio)

print('Consumer Staples P/E ratio')
print(f'mean: {consumer_staples_price_to_earnings_ratio_mean}')
print(f'std: {consumer_staples_price_to_earnings_ratio_std}')

# Plot P/E ratios
Let's take a closer look at the P/E ratios using a scatter plot for each company in these two sectors. Also, each company name has been assigned a numeric ID contained in the arrays `info_tech_id` and `consumer_staples_id`.

In [None]:
info_tech_id = np.arange(0, 15)
consumer_staples_id = np.arange(0, 12)

In [None]:
# Make a scatterplot
plt.scatter(info_tech_id, info_tech_price_to_earnings_ratio, color='blue', label='Info Tech')
plt.scatter(consumer_staples_id, consumer_staples_price_to_earnings_ratio, color='orange', label='Consumer Staples')

# Add legend
plt.legend()

# Add labels
plt.xlabel('Company ID')
plt.ylabel('P/E Ratio')
plt.show()

# Visualizing Trends
In your scatterplot, did you notice that there is a ratio that is higher than the others? An outlier? In this part of the case study, let's take a closer look to determine the name of the company.

1. Make a histogram
    - Remember that histograms can help you look at the spread of data. As a first step to taking a closer look at the IT sector, let's make a histogram of its price to earnings ratios. To plot a histogram, you can use the `hist()` function from the `pyplot` module. You'll also need to define the number of `bins` for the histogram plot.
2. Identify the Outlier
    - Based on the histogram, you'll look to identify the P/E ratio outlier. Based on this PE ratio, you can subset this company's specific data. The final step in this case study is to identify the name of the company that is associated with an abnormally high P/E ratio within the IT sector.

# Histogram of P/E ratios
To visualize and understand the distribution of the P/E ratios in the IT sector, you can use a histogram.

In [None]:
# Plot histogram
plt.hist(info_tech_price_to_earnings_ratio, bins=8)

# Add x-label
plt.xlabel('P/E ratio')

# Add y-label
plt.ylabel('Frequency')

# Show plot
plt.show()

# Name the outlier
You've identified that a company in the Information Technology sector has a P/E ratio of greater than 50. Let's identify this company.

In [None]:
# Identify P/E ratio within info_tech_price_to_earnings_ratio that is > 50
outlier_price = info_tech_price_to_earnings_ratio[info_tech_price_to_earnings_ratio > 50]

# Identify the company with PE ratio > 50
outlier_name = info_tech_names[info_tech_price_to_earnings_ratio > 50]

# Display results
print("In 2017, " + str(outlier_name[0]) + " had an abnormally high P/E ratio of " + str(round(outlier_price[0], 2)) + ".")