# Exploratory Data Analysis for Sales Data
***

## Hypothesis Testing(t-test)

**Null Hypothesis (H0):** There is no effect between the price of each item (priceeach) and total sales (sales).

H0: β1 = 0

The null hypothesis asserts that the coefficient (β1) for `priceeach` in the regression equation is equal to zero, which means that there is no significant relationship between the price of each item and sales.

**Alternative Hypothesis (H1):** There is a significant effect between the price of each item (priceeach) and total sales (sales).

H1: β1 ≠ 0

The alternative hypothesis claims that the coefficient (β1) for `priceeach` in the regression equation is not equal to zero, indicating that there is a significant effect of the price of each item on sales. This is typically what you are trying to establish in your analysis.


#### The simple linear regression model for predicting sales is given by the equation:

$\text{sales} = b_0 + b_1 \text{priceeach} $

Where:
- \$\text{sales}\$ represents the predicted or estimated value of the sales.
- \$b_0\$ is the y-intercept, which is the estimated value of $\text{sales}$ when $\text{priceeach}\$ is 0.
- \$b_1\$ is the slope coefficient, which represents how sales change with a one-unit change in \$\text{priceeach}\$.
- \$\text{priceeach}\$ is the independent variable or predictor variable representing the price for each unit.

#### The formula to calculate the slope coefficient $b_1$ in simple linear regression is as follows:
### $b_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}$
***
#### The formula for calculating the y-intercept $b_0$ in simple linear regression is as follows:
### $b_0 = \bar{y} - b_1\bar{x}$

## Summary Statistics for priceeach(sum, mean)

In [10]:
import numpy as np
import json
import pandas as pd
from sales_data_extractor import DatabaseConnector

def calculate_summary_stats(data):
    summary_stats = {
        "Mean": round(np.mean(data)),
        "Sum": np.sum(data),
    }
    return summary_stats

if __name__ == "__main__":

    db_connector = DatabaseConnector('config.json')

    user_query = str(input("Enter the query you would like to run: "))

    
    query_results = db_connector.execute_query(user_query)

    
    results, column_names = query_results

    
    df = pd.DataFrame(results, columns=column_names)

    # Calculate summary statistics for a specific column
    column_to_analyze = 'priceeach'
    if column_to_analyze in df.columns:
        data_to_analyze = df[column_to_analyze]
        summary_stats = calculate_summary_stats(data_to_analyze)
        print(f"Summary Statistics for {column_to_analyze}:")
        for stat, value in summary_stats.items():
            print(f"{stat}: {value}")
    else:
        print(f"Column '{column_to_analyze}' not found in the DataFrame.")

    db_connector.close_connection()


Enter the query you would like to run:  SELECT * FROM sales


Summary Statistics for priceeach:
Mean: 94
Sum: 18663.55


## Summary Statistics for sales(sum, mean)

In [11]:
import numpy as np
import json
import pandas as pd
from sales_data_extractor import DatabaseConnector

def calculate_summary_stats(data):
    summary_stats = {
        "Mean": round(np.mean(data)),
        "Sum": np.sum(data),
    }
    return summary_stats

if __name__ == "__main__":

    db_connector = DatabaseConnector('config.json')

    user_query = str(input("Enter the query you would like to run: "))

    
    query_results = db_connector.execute_query(user_query)

    
    results, column_names = query_results

    
    df = pd.DataFrame(results, columns=column_names)

    # Calculate summary statistics for a specific column
    column_to_analyze = 'sales'
    if column_to_analyze in df.columns:
        data_to_analyze = df[column_to_analyze]
        summary_stats = calculate_summary_stats(data_to_analyze)
        print(f"Summary Statistics for {column_to_analyze}:")
        for stat, value in summary_stats.items():
            print(f"{stat}: {value}")
    else:
        print(f"Column '{column_to_analyze}' not found in the DataFrame.")

    db_connector.close_connection()


Enter the query you would like to run:  SELECT * FROM sales


Summary Statistics for sales:
Mean: 4623
Sum: 915408
