# Exploratory Data Analysis

Exploratory Data Analysis (EDA) delves into the data's depths, revealing its underlying patterns and connections.
- EDA is an initial exploration to grasp the data's nuances.
- It helps us to understand the different facets of the dataset.
- EDA helps to derive meaningful conclusions from the insights of the data.

1. Import the data
    - You can import data from a local computer, website, or using an API like yfinance.
    - You can also use some built-in datasets.
    - You can import Excel or CSV files.

2. Explore the Data
    - You can use the methods *head(), tail(), columns, index, info(), describe()*.

3. Cleaning and refining the data:
    - Check data types using info().
        - If a numerical feature is given as a string, do conversion.
        - If a date feature is given as a string, do conversion.
    - You can change the column or row labels.
        - You can use rename() to change column or row labels.
        - You can use reset_index() or set_index() to change indexes or row labels.
    - Rectifying missing values.
        - You can use *info(), isnull() or notnull()* to detect missing values.
        - You can either drop or fill the missing values.
            - Drop rows or columns with missing values.
        - Fill in the missing values.
            - You can fill it with a constant value or the mean/median of the column.
            - You can use imputation methods.
    - Eliminating duplications.
        - You can use *duplicate()* to determine repeated rows.
        - You can either drop the repetitions or keep them.
        - You can use *drop_duplicates()* to remove repetitions.
    - Filtering out extraneous noise.
    - Spotting and rectifying anomalies and outliers.
        - You can either remove them or keep them.
        - You can use different definitions of outliers.
            - One method is using the interquartile range (IQR).
    - Removing redundancies.
        - Drop the columns which do not provide useful information.
    - Scaling Data
        - If the scales of featues are different, it might be a good idea to change their scale to the same level.
        - You can use *StandardScaler()* or *MinMaxScaler()*.

4. Statistical Tools
    - Utilize statistical tools and graphs to derive insights from the data.
    - Uncovering significant features and interrelations among them.
    - You can find correlations between features using the method *corr()*.
    - You can display it using *sns.heatmap()*.

5. Visualization
    - Utilize graphs to derive insights from the data.
    - You can use line plots, scatter plots, histograms, pie charts, pair plots.


## Important Libraries
- pandas
- numpy
- seaborn
- matplotlib

## Open Data Repositories

- UC Urvine ML Repository: https://archive.ics.uci.edu/ml/index.php
- Kaggle: https://www.kaggle.com/datasets
- Wiki: https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research