In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Step 1

### load the dataset

In [None]:
# Reading in the dataset
stocks = pd.read_csv('/kaggle/input/stock-market-historical-data-of-top-10-companies/data.csv')

# Step 2

### EDA

Now that the `.csv` file has been read and loaded, we need to get familiar with the data and understand what we are working with. This is done through EDA.

In [None]:
# This will show a quick overview of the dataset
stocks.head()

In [None]:
stocks.dtypes

The only thing that can be statistically broken down is the Volume because it's the only thing that is a numerical data type at the moment.


In [None]:
stocks.describe().round(2)

After looking at the data types and seeing how the `Date`, `Close/Last`, `Open`, `High`, and `Low` columns are an object type (string). We need to convert them to their correct data types.

In [None]:
import datetime
date_time_interval = pd.to_datetime(stocks.Date).dt.normalize()
date_time_interval

In [None]:
stocks.Date = date_time_interval

In [None]:
stocks.dtypes

Now the `Date` column is in it's proper data type, we need to fix the other columns.

In [None]:
stocks['Close/Last'] = stocks['Close/Last'].map(lambda x:float(x.replace('$','')))
stocks['Open'] = stocks['Open'].map(lambda x:float(x.replace('$','')))
stocks['High'] = stocks['High'].map(lambda x:float(x.replace('$','')))
stocks['Low'] = stocks['Low'].map(lambda x:float(x.replace('$','')))

In [None]:
stocks.dtypes

Let's rename the colum `Close/Last` to an easier name. I will rename the column to `Close`.

In [None]:
stocks.rename(columns={'Close/Last' : 'Close'}, inplace=True)

print(stocks)

In [None]:
stocks.Company.value_counts()

In [None]:
stocks.Company.unique().size

In [None]:
stocks.isna().sum()

With this information we see that there are no NaN values in the dataset.

Now that the other columns have been converted to their proper data types there is more information to analyze.


In [None]:
stocks.describe().round(2)

In [None]:
stocks.head()

In [None]:
# Split the dataset into separate DataFrames based on each company
company_dfs = dict(list(stocks.groupby('Company')))

In [None]:
# Plotting the closing prices for each company
for Company, company_df in company_dfs.items():
    plt.plot(company_df['Date'], company_df['Close'], label=Company)

# Customize the plot
plt.xlabel('Years')
plt.ylabel('Closing Price')
plt.title('Closing Stock Prices over Time')
plt.legend()

# Display the plot
plt.show()

This allows you to visually see each company and their closing price over time. For a clearer and cleaner visualization it will be handled in [Tableau](https://public.tableau.com/views/Historical_Stock_Data_of_the_10_companies/Historicalstockdataofthetop10companies?:language=en-US&:display_count=n&:origin=viz_share_link). 

In the Tableau visualization we can see that the trading volume of each stock was lighter when the price got higher and further away than the average closing price. This is an indication that the stock price was over valued.

There is also a large drop in the closing price during the pandemic, meaning that the stock price can be largly effected by extrenuating circumstances that cannot be predicted or controld.