# Trade-Ahead-Prediction  

## Objective  
This project focuses on analyzing stock price data and financial indicators for companies listed on the New York Stock Exchange (NYSE). The goal is to group stocks based on their characteristics using clustering techniques and provide actionable insights into the features of each group.  

Trade&Ahead, a financial consultancy firm, aims to use this analysis to deliver personalized investment strategies to its customers.  

---

## Data Description  
The dataset contains information on stock prices and financial indicators such as ROE, earnings per share, and P/E ratio. Below is a summary of the data fields:  

- **Ticker Symbol**: Abbreviation used to uniquely identify publicly traded shares of a particular stock on a stock market.  
- **Company**: Name of the company.  
- **GICS Sector**: The specific economic sector assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations.  
- **GICS Sub Industry**: The specific sub-industry group assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations.  
- **Current Price**: Current stock price in dollars.  
- **Price Change**: Percentage change in the stock price in 13 weeks.  
- **Volatility**: Standard deviation of the stock price over the past 13 weeks.  
- **ROE**: A measure of financial performance calculated by dividing net income by shareholders' equity (shareholders' equity is equal to a company's assets minus its debt).  
- **Cash Ratio**: The ratio of a company's total reserves of cash and cash equivalents to its total current liabilities.  
- **Net Cash Flow**: The difference between a company's cash inflows and outflows (in dollars).  
- **Net Income**: Revenues minus expenses, interest, and taxes (in dollars).  
- **Earnings Per Share**: Company's net profit divided by the number of common shares it has outstanding (in dollars).  
- **Estimated Shares Outstanding**: The number of shares of the company's stock currently held by all its shareholders.  
- **P/E Ratio**: Ratio of the company's current stock price to the earnings per share.  
- **P/B Ratio**: Ratio of the company's stock price per share by its book value per share (book value of a company is the net difference between that company's total assets and total liabilities).  

---

## Exploratory Data Analysis (EDA) Questions  

1. **What does the distribution of stock prices look like?**  
   Analyze the spread and skewness of stock prices to understand pricing patterns.  

2. **The stocks of which economic sector have seen the maximum price increase on average?**  
   Identify sectors showing significant growth trends based on price changes.  

3. **How are the different variables correlated with each other?**  
   Use correlation analysis to identify relationships between financial indicators (e.g., ROE, P/E Ratio) and stock performance.  

4. **How does the average cash ratio vary across economic sectors?**  
   Examine the ability of companies in each sector to meet short-term obligations using cash equivalents.  

5. **How does the P/E ratio vary, on average, across economic sectors?**  
   Analyze P/E ratios to determine how investors value stocks in different sectors relative to earnings.  

---

In [1]:
import numpy as np
import pandas as pd

# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# # Import standard scalar
from sklearn.preprocessing import StandardScaler

# distances
from scipy.spatial.distance import cdist, pdist

# Lib To  perform k-means clustering and compute silhouette scores
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# to perform hierarchical clustering, compute cophenetic correlation, and create dendrograms
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage, cophenet

# To ignore unnecessary warnings
import warnings
warnings.filterwarnings("ignore")

In [3]:
# mount the drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
path='/content/drive/MyDrive/Python Course'

In [None]:
# store pellete for future use
pellete='Set2'
colors = sns.color_palette(pellete)  # Get Set2 color palette for future use
sns.set(style="darkgrid") # Set grid style

## **Data Overview**

- Observe the first few rows of the dataset, to check whether the dataset has been loaded properly or not
- Get information about the number of rows and columns in the dataset
- Find out the data types of the columns to ensure that data is stored in the preferred format and the value of each property is as expected.
- Check the statistical summary of the dataset to get an overview of the numerical columns of the data
- Check for missing values
- Check for null values

In [5]:
# load the data in to panda dataframe
trad_df=pd.read_csv(f'{path}/stock_data.csv')

In [6]:
# Deep copy the dataframe to avoid chnage to orginal one
trade_ahead_df=trad_df.copy(deep=True)

In [8]:
# Detail info about the dataset
trade_ahead_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 340 entries, 0 to 339
Data columns (total 15 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Ticker Symbol                 340 non-null    object 
 1   Security                      340 non-null    object 
 2   GICS Sector                   340 non-null    object 
 3   GICS Sub Industry             340 non-null    object 
 4   Current Price                 340 non-null    float64
 5   Price Change                  340 non-null    float64
 6   Volatility                    340 non-null    float64
 7   ROE                           340 non-null    int64  
 8   Cash Ratio                    340 non-null    int64  
 9   Net Cash Flow                 340 non-null    int64  
 10  Net Income                    340 non-null    int64  
 11  Earnings Per Share            340 non-null    float64
 12  Estimated Shares Outstanding  340 non-null    float64
 13  P/E R

In [10]:
trade_ahead_df.shape

(340, 15)

In [11]:
trade_ahead_df.head()

Unnamed: 0,Ticker Symbol,Security,GICS Sector,GICS Sub Industry,Current Price,Price Change,Volatility,ROE,Cash Ratio,Net Cash Flow,Net Income,Earnings Per Share,Estimated Shares Outstanding,P/E Ratio,P/B Ratio
0,AAL,American Airlines Group,Industrials,Airlines,42.349998,9.999995,1.687151,135,51,-604000000,7610000000,11.39,668129900.0,3.718174,-8.784219
1,ABBV,AbbVie,Health Care,Pharmaceuticals,59.240002,8.339433,2.197887,130,77,51000000,5144000000,3.15,1633016000.0,18.80635,-8.750068
2,ABT,Abbott Laboratories,Health Care,Health Care Equipment,44.91,11.301121,1.273646,21,67,938000000,4423000000,2.94,1504422000.0,15.27551,-0.394171
3,ADBE,Adobe Systems Inc,Information Technology,Application Software,93.940002,13.977195,1.357679,9,180,-240840000,629551000,1.26,499643700.0,74.555557,4.199651
4,ADI,"Analog Devices, Inc.",Information Technology,Semiconductors,55.32,-1.827858,1.701169,14,272,315120000,696878000,0.31,2247994000.0,178.451613,1.05981


### Observations

- **Total Entries:** `340`  
- **Total Columns:** `15`  

#### **Column Details:**

1. **`Ticker Symbol`**  
   - **Type:** `Object` (String)  
   - **Description:** Unique identifier for each security.  

2. **`Security`**  
   - **Type:** `Object` (String)  
   - **Description:** Name of the security.  

3. **`GICS Sector`**  
   - **Type:** `Object` (String)  
   - **Description:** Global Industry Classification Standard sector.  

4. **`GICS Sub Industry`**  
   - **Type:** `Object` (String)  
   - **Description:** Subcategory of the GICS sector.  

5. **`Current Price`**  
   - **Type:** `Float`  
   - **Description:** Current stock price.  

6. **`Price Change`**  
   - **Type:** `Float`  
   - **Description:** Change in price from the previous close.  

7. **`Volatility`**  
   - **Type:** `Float`  
   - **Description:** Measure of price fluctuation.  

8. **`ROE`**  
   - **Type:** `Int`  
   - **Description:** Return on equity percentage.  

9. **`Cash Ratio`**  
   - **Type:** `Int`  
   - **Description:** Ratio of cash and cash equivalents to liabilities.  

10. **`Net Cash Flow`**  
    - **Type:** `Int`  
    - **Description:** Cash flow from operating, investing, and financing activities.  

11. **`Net Income`**  
    - **Type:** `Int`  
    - **Description:** Total profit after expenses.  

12. **`Earnings Per Share`**  
    - **Type:** `Float`  
    - **Description:** Profit per outstanding share.  

13. **`Estimated Shares Outstanding`**  
    - **Type:** `Float`  
    - **Description:** Approximate number of shares currently held by investors.  

14. **`P/E Ratio`**  
    - **Type:** `Float`  
    - **Description:** Price-to-earnings ratio.  

15. **`P/B Ratio`**  
    - **Type:** `Float`  
    - **Description:** Price-to-book ratio.  
