# Follow-Along Activity

In [None]:
# Upload file
from google.colab import files
uploaded = files.upload()

# Read CSV file
import pandas as pd
df = pd.read_csv('6_data.csv')

Saving 6_data.csv to 6_data.csv


In [None]:
# Display the first few rows
print(df.head(10))

   Quarter    Revenue   Expenses    Profit
0  2015-Q1  202483.57  160755.13  41728.44
1  2015-Q2  201763.13  152336.52  49426.61
2  2015-Q3  208177.46  164022.98  44154.48
3  2015-Q4  235814.63  183691.46  52123.18
4  2016-Q1  208829.23  150269.34  58559.89
5  2016-Q2  211406.48  165821.93  45584.56
6  2016-Q3  223082.03  137797.44  85284.59
7  2016-Q4  243446.63  155610.25  87836.38
8  2017-Q1  218152.63  132864.87  85287.76
9  2017-Q2  225918.83  150250.95  75667.88


In [None]:
print("Pearson Correlation Matrix:")
print(df.corr(numeric_only=True))

Pearson Correlation Matrix:
           Revenue  Expenses    Profit
Revenue   1.000000  0.854505  0.618631
Expenses  0.854505  1.000000  0.120506
Profit    0.618631  0.120506  1.000000


In [None]:
correlation = df['Revenue'].corr(df['Profit'])
print("Correlation between Revenue and Profit:", round(correlation, 2))


Correlation between Revenue and Profit: 0.62


# Your Project

In [None]:
# Upload the Excel file from your local machine
from google.colab import files
uploaded = files.upload()

# Import the pandas library for data handling
import pandas as pd

# Read the Excel file and select the sheet named 'Income_Statement'
df_excel = pd.read_excel('6_data.xlsx', sheet_name='Income_Statement')

# Display the first 10 rows of the dataset to check the structure
print("Preview of dataset:")
print(df_excel.head(10))

# Calculate and display the Pearson correlation matrix for numeric columns
print("Pearson Correlation Matrix:")
print(df_excel.corr(numeric_only=True))

# Calculate the Pearson correlation between Revenue and Cost_of_Goods_Sold
correlation = df_excel['Revenue'].corr(df_excel['Cost_of_Goods_Sold'])
print("Correlation between Revenue and Cost_of_Goods_Sold:", round(correlation, 2))


Saving 6_data.xlsx to 6_data (1).xlsx
Preview of dataset:
   Quarter    Revenue  Cost_of_Goods_Sold  Operating_Expenses  \
0  2015-Q1  504967.14           325924.83           107032.54   
1  2015-Q2  503544.06           310319.06           136623.83   
2  2015-Q3  516378.84           330960.02           142561.10   
3  2015-Q4  581649.14           368895.91           148976.47   
4  2016-Q1  517658.47           305255.43           143441.46   
5  2016-Q2  522782.40           333682.14           130371.25   
6  2016-Q3  546090.16           280293.81           137763.96   
7  2016-Q4  596749.95           315917.89           144863.50   
8  2017-Q1  536105.26           271689.62           108583.78   
9  2017-Q2  551554.32           302692.76           116261.66   

   Operating_Profit  Net_Profit  
0          72009.77    57607.82  
1          56601.17    45280.94  
2          42857.72    34286.17  
3          63776.77    51021.41  
4          68961.57    55169.26  
5          58729.01   


# Extra: Correlation Types in Python

Below are three types of correlation in Python: **Pearson**, **Spearman**, and **Kendall**.

---

## 1. Pearson Correlation

### Definition
Measures the strength and direction of a **linear** relationship between two continuous variables. Values range from -1 to 1.

### When to Use
- Data is continuous and approximately normally distributed.
- Relationship is linear.

### Example
Relationship between `hours studied` and `exam score`.

### Python Code
```python
import pandas as pd

# Example dataset
data = {
    'hours_studied': [1, 2, 3, 4, 5],
    'exam_score': [50, 55, 65, 70, 75]
}
df = pd.DataFrame(data)

# Pearson correlation
pearson_corr = df.corr(method='pearson')
print(pearson_corr)
```

### Explanation
The coefficient will be close to 1, indicating a strong positive linear relationship.

---

## 2. Spearman Rank Correlation

### Definition
Measures the strength and direction of a **monotonic** relationship using **ranked values**. Does not assume normal distribution.

### When to Use
- Data is ordinal or not normally distributed.
- The relationship is monotonic but not necessarily linear.

### Example
Relationship between `income rank` and `education level`.

### Python Code
```python
# Spearman correlation
spearman_corr = df.corr(method='spearman')
print(spearman_corr)
```

### Explanation
Spearman looks at whether as one variable increases, the other tends to increase or decrease, regardless of the rate of change.

---

## 3. Kendall’s Tau

### Definition
A rank-based measure of correlation that considers the number of **concordant and discordant pairs**. More robust for small datasets.

### When to Use
- Ordinal data or small sample sizes.
- To validate results from Spearman.

### Example
A small dataset of preferences ranked by two reviewers.

### Python Code
```python
# Kendall correlation
kendall_corr = df.corr(method='kendall')
print(kendall_corr)
```

### Explanation
More conservative than Spearman; preferred in small datasets with many ties or when more robustness is needed.

---

## Summary

| Correlation Type | Measures        | Use Case                          | Assumes Normality |
|------------------|------------------|-----------------------------------|-------------------|
| Pearson          | Linear           | Continuous, linear data           | Yes               |
| Spearman         | Monotonic (rank) | Ordinal or non-linear monotonic   | No                |
| Kendall          | Ordinal (rank)   | Small samples, ordinal data       | No                |

---
