## Introduction 
This analysis focuses on stock performance data from S&P 500 companies for 2015. There were 252 business days in 2015 so the calculation would define a full year performance based on these days only. The analysis would comprise of 3 main parts as follows: 
Calculate daily return 
Perform correlation test for each company's stock 
Implement clustering algorithm
## Analysis
## Part 1. Stock Returns
### Calculate daily return
In this part, the report produces daily return number in percentage for each company. This analysis then answers the following questions:
Which companies experienced the maximum and minimum daily returns?
What were the reasons event in point 1 occurred?
Which companies performed overall best and worst over the year?
Which companies exhibited most and least volatility based on the standard deviation of their returns over the year 
In order to calculate each company's stock daily return, the calculation would start at day 2 or 5 January 2015 because in that way there would be baseline price to compare with (day 1 or 2 January 2015). From day 2 onwards, the percentage change could be calculated as follows:

  $${x} = \frac{p_t - p_{t-1}}{p_{t-1}}$$

There is a function inside pandas module named pct _ change to calculate percentage difference from the series data type. The code were as follows:

#### Calculate day-to-day return for all companies 

In [17]:
#importing required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import csv

filename = 'SP_500_close_2015.csv'
priceData = pd.read_csv(filename,index_col = 0)

#Read company names into a dictionary
def readNamesIntoDict():
    d = dict()
    input_file = csv.DictReader(open("SP_500_firms.csv"))
    for row in input_file:
        #print(row)
        d[row['Symbol']] = [row['Name'],row['Sector']]
    return d

companyNames = readNamesIntoDict()

#creating new dataframe for Daily Returns (dailyret)
dailyret = priceData.pct_change(1)

#### Calculate year-on-year return for all companies
The analysis calculates the return during one year for all companies. The calculation uses stock price on the first trading date in 2015 (2 January 2015) and last trading date of the year (31 December 2015).

In [18]:
#creating dataframe for yearly change in shareprice
yearlyret = priceData.pct_change(len(priceData)-1).dropna()

#### Top Daily Stock Performance
Top stock performance in a day over the year:

In [19]:
#FUNCTIONS
#Finding 3 companies with biggest increase in the dataframe.
#result produces company name, date of occurence and change in value
def maxrise(retdf):
    maxname=retdf.max().sort_values()
    maxdate=(retdf.max(1).dropna()).sort_values()
    result = pd.DataFrame()
    if len(retdf)==1:   # this function checks whether the return relates to a year (len =1) or a daily 
        for i in range(-1, -4, -1):     
            detail = pd.DataFrame({'Percentage (%)': [maxname.iloc[i]*100],
                                   'Company Name': [companyNames[str(maxname.index[i])][0]],
                                   'Code': [maxname.index[i]]})
            result = result.append(detail, ignore_index = True)
    else:
        for i in range(-1, -4, -1):
            detail = pd.DataFrame({'Equity': [maxname.index[i]],
                                   'Firm Name': [companyNames[str(maxname.index[i])][0]],
                                   'Date' : [maxdate.index[i]],
                                   'Percentage (%)': [maxname.iloc[i]*100]})
            result = result.append(detail, ignore_index = True)
    return result

#biggest rise in a day
maxrise(dailyret)

Unnamed: 0,Date,Equity,Firm Name,Percentage (%)
0,2015-08-27,FCX,Freeport-McMoran Cp & Gld,28.66162
1,2015-06-22,WMB,Williams Cos.,25.899876
2,2015-10-14,TRIP,TripAdvisor,25.53606


#### Analysis
The biggest daily growth of the year occured on 27/08/2015 for a company called Freeport-McMoran Cp & Gld (NYSE: FCX). The share price has increased from 7.8894 (eod 26/08/2015) to 10.1506 (eod 27/08/2015), a total increase of 28.66%. The sharp increase has resulted from announcement by the company to further cut the costs in response to market conditions. The company predicted that the total capital expenditure will decrease by $700 million.(Freeport-McMoRan Inc., 2016) In addition to the cost cutting, it has been announced on 27/08/2015 that influential investor Carl Icahn has bought a large stake in the company, which indicated a belief by the investor that the company's performance will improve. (Miller & Benoit, 2015) Combination of these two factors had a strong impact on market sentiment towards the company and resulted in large shift of the share price on the day.

#### Worst Daily Stock Performance
In order to calculate worst stock performance in a day, the analysis used following functions:

In [20]:
#Finding 3 companies with biggest fall in the dataframe:
#result produces company name, change in value and the date
def maxdrop(retdf):
    minname=retdf.min().sort_values()
    mindate=(retdf.min(1).dropna()).sort_values()
    result = pd.DataFrame()
    if len(retdf)==1: #this function establishes whether the return is calculated for a year (=1) or a daily one
        for i in range(0, 3):
            detail = pd.DataFrame({'Code': [minname.index[i]],
                                   'Company Name': [companyNames[str(minname.index[i])]],
                                   'Percentage (%)': [minname.iloc[i]*100]})
            result = result.append(detail, ignore_index = True)
    else: 
        for i in range(0, 3):
            detail = pd.DataFrame({'Equity': [minname.index[i]],
                                   'Firm Name': [companyNames[str(minname.index[i])]],
                                   'Date' : [mindate.index[i]], 
                                   'Percentage (%)': [minname.iloc[i]*100]})
            result = result.append(detail, ignore_index = True)
    return result

#biggest fall in a day
maxdrop(dailyret)

Unnamed: 0,Date,Equity,Firm Name,Percentage (%)
0,2015-10-16,PWR,"[Quanta Services Inc., Industrials]",-28.50057
1,2015-05-27,KORS,"[Michael Kors Holdings, Consumer Discretionary]",-24.195412
2,2015-07-24,BIIB,"[BIOGEN IDEC Inc., Health Care]",-22.080247


#### Analysis

The biggest daily fall of the year occurred on 16/10/2015 for a company called Quanta Services Inc. (NYSE: PWR). The share price has plummeted from 26.21 (eod 15/10/2015) to 18.74 (eod 16/10/2015), total decrease of 28.5%. The sharp fall has resulted from the company's announcement of its preliminary third quarter results. In the announcement the company lowered its full year revenue forecast by $200 million. This has been a result of a wider trend of reduction in operating margins for the Oil and Gas Infrastructure Services segment, which in turn impacted by difficult macro-economic environment.(Quanta Services, 2015) It was interesting to note that the company's shares are currently (as of 12/10/2016) trading at 27.65. It shows that the investors' appetite for the shares have gone to above the pre-fall levels.

#### Stock Price Volatility

To evaluate stock's price volatility, the analysis requires the data for average price and standard deviation for every company in 2015. Mean calculation defined as follows:  
$$\bar{x} = \frac{1}{n} \sum_{i=1}^{n}x_{i}$$  

And standard deviation defined as follows:  
$$\sigma = \sqrt{\frac{\sum\limits_{i=1}^{n} \left(x_{i} - \bar{x}\right)^{2}} {n-1}}$$  

To calculate mean and standard deviation, the analysis used the following code:

In [21]:
#Calculate the correlations between all stocks in the data using returns. 
#Create an empty data frame n*n setting as columns and rows the stocks' names
col = dailyret.columns
col = col.tolist()
cor = pd.DataFrame(index=col, columns=col)

#Create a data frame 'cor' with the correlations between all stocks (less efficient way)
#Calculate correlations in a faster and easier way
def calcCorr():
    cor1 = np.corrcoef(dailyret[1: ],rowvar=0)
    cor = pd.DataFrame(cor1, columns = col, index = col)
    return cor

#This is how correlation calculation works in general (less efficient way)
def calcCorr2():
    for i in range(0, dailyret.shape[1]):
        for j in range(0, dailyret.shape[1]):
            dividend = (dailyret.shape[0]-1) * sum(dailyret.iloc[1:dailyret.shape[0]-1, i] * dailyret.iloc[1:dailyret.shape[0]-1, j]) - sum(dailyret.iloc[1:dailyret.shape[0]-1, i]) * sum(dailyret.iloc[1:dailyret.shape[0]-1, j])
            divisor1 =  (((dailyret.shape[0]-1) * sum((dailyret.iloc[1:dailyret.shape[0]-1, i])**2) - (sum(dailyret.iloc[1:dailyret.shape[0]-1, i]))**2))**(1/2)   
            divisor2 = (((dailyret.shape[0]-1) * sum((dailyret.iloc[1:dailyret.shape[0]-1, j])**2) - (sum(dailyret.iloc[1:dailyret.shape[0]-1, j]))**2))**(1/2)   
            cor.iloc[i,j] = dividend/(divisor1*divisor2)
    return cor



corTable = calcCorr()
cor = corTable

## Part 2. Correlatio Analysis

#### Company Correlation
In order to provide two companies correlation, the analysis used the following code:

In [23]:
#Provide a convenient way for a user to print out two companies' full names 
#and a correlation between their returns 
def compandcor():
    name1 = input("Please enter stock symbol of FIRST company: ")  #the function asks for user's input
    name2 = input("Please enter stock symbol of SECOND company: ") #the function asks for user's input
    col = dailyret.columns.tolist()
    #function looks for correlation information for entered companies    
    for i in col:
        for j in col:
            if i == name1.upper() and j == name2.upper():
                return print("Company Name 1: " + str(companyNames[name1][0]) + " \n"+ "Company Name 2: " + str(companyNames[name2][0]) + " \n" + "Correlation: " + str(corTable.loc[i, j]))
    return ("Error. Use correct format (Example: AAPL, MSFT, fb, zts, etc. No spaces.). Please also ensure that the entered company symbol was in S&P 500 for 2015.") # if stock symbol is entered incorrectly OR not in the list- this statement is returned.

compandcor()

Please enter stock symbol of FIRST company: AAPL
Please enter stock symbol of SECOND company: MSFT
Company Name 1: Apple Inc. 
Company Name 2: Microsoft Corp. 
Correlation: 0.523573201089


#### Highest and Lowest Correlation from a company
To calculate what are the highest and lowest correlation from a company, the analysis used this following code:

In [24]:
#Provide a convenient way for a user to input a stock's name and print out the full name of the two companies with wcich has 
#the highest and lowest correlation respectively.
def bestandworstcor(name):
    compName = companyNames[name][0]
    largest = -1
    lowest = 1
    for j in col:
        if cor.loc[name, j] > largest and j != name:
            largest = cor.loc[name, j]
            top = companyNames[j][0]
        if cor.loc[name, j] < lowest:
            lowest = cor.loc[name, j]
            bottom = companyNames[j][0]
    return print ("Highest Correlation Company with " + str(compName)+ " is " + str(top) + " with "+ str(largest) + " \n" + "Lowest Correlation Company with " + str(compName)+ " is "+ str(bottom) + " with "+ str(lowest))

bestandworstcor('AMZN')

Highest Correlation Company with Amazon.com Inc is Alphabet Inc Class A with 0.58555313236 
Lowest Correlation Company with Amazon.com Inc is Stericycle Inc with 0.0564506179566


In [25]:
bestandworstcor('MSFT')

Highest Correlation Company with Microsoft Corp. is Marsh & McLennan with 0.604548883151 
Lowest Correlation Company with Microsoft Corp. is Stericycle Inc with 0.0288867601988


In [26]:
bestandworstcor('FB')

Highest Correlation Company with Facebook is Fiserv Inc with 0.61966671131 
Lowest Correlation Company with Facebook is Newmont Mining Corp. (Hldg. Co.) with -0.00283227001625


In [27]:
bestandworstcor('AAPL')

Highest Correlation Company with Apple Inc. is Illinois Tool Works with 0.601265434285 
Lowest Correlation Company with Apple Inc. is Range Resources Corp. with 0.112710875584


In [28]:
bestandworstcor('GOOGL')

Highest Correlation Company with Alphabet Inc Class A is Alphabet Inc Class C with 0.989365040395 
Lowest Correlation Company with Alphabet Inc Class A is Transocean with 0.00952277504792


Analysis of highly correlated companies shows the following:

Amazon:          'Alphabet Inc Class A', 0.58391    

Highest correlation of Amazon is to Alphabet Inc(partner company of Google). This is not surprising as both companies are leaders in technological sector with a focus on innovative solutions and products aimed at mass market. Therefore prices of these two companies are likely to be affected by the same factors affecting tech companies (i.e. drop/surge in demand, advancements in technology innovation, etc.).


Microsoft:       'Procter & Gamble', 0.60311

Microsoft shares are highly correlated to Procter & Gamble. This is somewhat a surprising result as the companies' main business in different industries. One reason for a correlation could be the fact that two companies collaborate together and P&G uses tools produced by Microsoft to make its operations more efficient.(Doctolero, 2014) However more likelier reason for the correlation is likely to be explained by the fact that both companies are giants in its own industries with a large amount of customers worldwide. Which in turn means that they are similarly exposed to macroeconomic trends (i.e. unemployment figures, economy growth, Foreign Exchange movements, etc.)


Facebook:       'Fiserv Inc', 0.61776  

Facebook's highest correlation for 2015 is with Fiserv, which is an american provider of financial services technology. Both companies are in Technological services sector and therefore it is not surprising that their share prices have moved similarly throughout the year. 


Apple:          'Illinois Tool Works', 0.60096

Apple's highly correlated company in 2015 was Illinois Tool Works, which a company that specialises in industrial manufacturing. The industry and company's specialisation is different to Apple's and therefore it is difficult to explain the high correlation between the two. The conducted research has largely not found any explanation for this correlation. There is a suggestion that the stocks correlated because one of the areas Illinois Tool Works specialise in is Test & Measurement and Electronics, which is somewhat in the similar area Apple operates.(Chemi & Wells, 2015) However this is cannot be the reason for high correlation therefore this is perhaps an example where relying simply on data could be misleading.


Alphabet Inc.*:    'Alphabet Inc Class C', 0.98933 
*Alphabet Inc. is a parent company of Google, which was formed on October 2, 2015.

Alphabet's (GOOGL) highest correlated company is unsurprisingly is the company's shares C (GOOGLE- Alphabet Inc Class C). The shares are impacted by the same factors almost perfectly and therefore are performing with a correlation, which is very close to 1.