![descriptive statistics II.png](attachment:b4d070d0-b5fe-4b80-b9c3-1a7429074be0.png)

# Descriptive Statistics (leveraging YDATA Libraries) 
#### by Joe Eberle started on 02_03_2023

**Descriptive statistics** serve as a fundamental tool for data scientists to comprehend the characteristics of their datasets, enabling them to uncover patterns and trends. By summarizing key features such as central tendency and variability, descriptive statistics offer concise insights into the distribution of data, facilitating informed decision-making and hypothesis testing. Ultimately, their utilization empowers data scientists to extract meaningful interpretations and communicate findings effectively to stakeholders, driving informed actions and solutions.

**Descriptive statistics** involve methods for summarizing and describing the features of the entire dataset. 

Statistics includes measures such as **mean, median, and mode** for **central tendency**, as well as measures like **standard deviation** and range for dispersion or spread. 

These statistics offer insights into the **distribution, variability, and characteristics** of the data, aiding in understanding and interpreting its underlying patterns and trends. trends.

In [1]:
# Install any Libraries your dont already have 
first_install = False 
if first_install:
    !pip install ydata-profiling
    !pip install pyttsx3 

In [2]:
# heart_data_filename = 'C:\\Data_Science_Data\\Test_Data\\healthcare\\heart_data.csv'
# fetal_health_filename = 'C:\\Data_Science_Data\\Test_Data\\healthcare\\fetal_health.csv'
# diabetes_data_filename = 'C:\\Data_Science_Data\\Test_Data\\healthcare\\diabetes_data.csv'
# stroke_data_filename = 'C:\\Data_Science_Data\\Test_Data\\healthcare\\stroke_data.csv'
# hypertension_data_filename = 'C:\\Data_Science_Data\\Test_Data\\healthcare\\hypertension_data.csv'
# aihs_data_filename = 'C:\\working_directory\\excel\\AIHS_patient.xlsx'
# titanic_data_repo = r"https://raw.githubusercontent.com/JoeEberle/datasets/main/titanic.csv"
# df = pd.read_csv(aihs_data_filename)

In [3]:
# Import all of the libraries you need  !pip install pandas_profiling
import ydata_profiling # provides descriptive statistics in html for any dataframe
import display_descriptive as dd
import pandas as pd #Pandas is high performance data manipulation 
import os 

## Required Setup Step 0 - Intitiate Configuration Settings and name the overall solution

In [4]:
import configparser 
config = configparser.ConfigParser()
cfg = config.read('config.ini')  

solution_name = 'descriptive_statistics'

## Required Setup Step 0 - Intitiate Configuration Settings and name the overall solution

In [5]:
# Establish the Python Logger  
import logging # built in python library that does not need to be installed 
import quick_logger as ql

global start_stime 
start_time = ql.set_start_time()
logging = ql.create_logger_Start(solution_name, start_time) 

## Optional Step 0 - Build a working directory to house your analysis

In [6]:
directory_path = r'c:\working_directory\html'
# Create the directory if it doesn't exist
if not os.path.exists(directory_path):
    os.makedirs(directory_path)
    print(f"Directory '{directory_path}' created successfully.")
else:
    print(f"Directory '{directory_path}' already exists.")

Directory 'c:\working_directory\html' already exists.


In [7]:
definition = '''
**Descriptive statistics** serve as a fundamental tool for data scientists to comprehend the characteristics of their datasets, enabling them to uncover patterns and trends. By summarizing key features such as central tendency and variability, descriptive statistics offer concise insights into the distribution of data, facilitating informed decision-making and hypothesis testing. Ultimately, their utilization empowers data scientists to extract meaningful interpretations and communicate findings effectively to stakeholders, driving informed actions and solutions.

**Descriptive statistics** involve methods for summarizing and describing the features of the entire dataset. 

Statistics includes measures such as **mean, median, and mode** for **central tendency**, as well as measures like **standard deviation** and range for dispersion or spread. 

These statistics offer insights into the **distribution, variability, and characteristics** of the data, aiding in understanding and interpreting its underlying patterns and trends.

''' 
# Write the solution defitions out to the solution_description.md file
file_name = "solution_description.md"
with open(file_name, 'w') as f:
    # Write the template to the readme.md file
     f.write(definition)

talking_code = False
if talking_code:
    tc.print_say(definition) 
else:
    print(definition)    


**Descriptive statistics** serve as a fundamental tool for data scientists to comprehend the characteristics of their datasets, enabling them to uncover patterns and trends. By summarizing key features such as central tendency and variability, descriptive statistics offer concise insights into the distribution of data, facilitating informed decision-making and hypothesis testing. Ultimately, their utilization empowers data scientists to extract meaningful interpretations and communicate findings effectively to stakeholders, driving informed actions and solutions.

**Descriptive statistics** involve methods for summarizing and describing the features of the entire dataset. 

Statistics includes measures such as **mean, median, and mode** for **central tendency**, as well as measures like **standard deviation** and range for dispersion or spread. 

These statistics offer insights into the **distribution, variability, and characteristics** of the data, aiding in understanding and interpr

## Step 1 - Load ANY data set for which to run discovery or data profiling

In [8]:
logging.info(f'{solution_name} - Step 1 - Load ANY data set for which to run discovery or data profiling')  
getting_titanic_data = True
if getting_titanic_data: 
    df = pd.read_csv("https://raw.githubusercontent.com/JoeEberle/reference_datasets/main/titanic.csv")    # Read the CSV file into a pandas DataFrame
    print(f'The data contains {df.shape[0]} rows and {df.shape[1]} columns')
    df_titanic = df    

The data contains 1310 rows and 14 columns


## Example - df.columns provides a list of each column

In [9]:
df.columns

Index(['pclass', 'survived', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket',
       'fare', 'cabin', 'embarked', 'boat', 'body', 'home.dest'],
      dtype='object')

## Example - df.describe() Provides High level stats on each column

In [10]:
df.describe() 

Unnamed: 0,pclass,survived,age,sibsp,parch,fare,body
count,1309.0,1309.0,1046.0,1309.0,1309.0,1308.0,121.0
mean,2.294882,0.381971,29.881135,0.498854,0.385027,33.295479,160.809917
std,0.837836,0.486055,14.4135,1.041658,0.86556,51.758668,97.696922
min,1.0,0.0,0.1667,0.0,0.0,0.0,1.0
25%,2.0,0.0,21.0,0.0,0.0,7.8958,72.0
50%,3.0,0.0,28.0,0.0,0.0,14.4542,155.0
75%,3.0,1.0,39.0,1.0,0.0,31.275,256.0
max,3.0,1.0,80.0,8.0,9.0,512.3292,328.0


In [11]:
df.shape

(1310, 14)

## Step 2 - Render descriptive statistics and profile every feature or Column of the Dataset

In [12]:
data_set_name = 'Titanic'
dd.display_descriptive_statistics(dd.get_descriptive_statistics(df,data_set_name))

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

Outputting descriptive statistics profile to: C:\working_directory\html\Titanicdescriptive_statistics_profile.html


'Displaying C:\\working_directory\\html\\Titanicdescriptive_statistics_profile.html in web brower'

## Step 0 - Process End - display log

In [None]:
# Calculate and classify the process performance 
status = ql.calculate_process_performance(solution_name, start_time) 
print(ql.append_log_file(solution_name))  

# https://github.com/JoeEberle/ --- josepheberle@outlook.com 


## Optional Jupyter Notebook Upgrade

In [None]:
# Upgrade your jupyter Notebook 
installing_jupyter_estensions = False
if installing_jupyter_estensions:
    ! pip install jupyter_contrib_nbextensions!
    ! jupyter contrib nbextension install --user