datasci_3_eda

This is an assignment for HHA 507

Objective: Engage in the critical phase of Exploratory Data Analysis (EDA) using the tools and techniques from Python to uncover patterns, spot anomalies, test hypotheses, and identify the main structures of your dataset.

Instructions:

Univariate Analysis:

Load a dataset of your choice in your Colab notebook .ipynb or in a python script .py (you can use one from previous assignments or find a new one).
Manually perform a univariate analysis to understand the distribution of each variable. This includes calculating measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation, IQR).
Visualize the distribution of select numerical variables using histograms.

Bivariate Analysis:

Analyze the relationship between pairs of variables.
- Use scatter plots to explore potential relationships between two numerical variables.
- For categorical and numerical variable pairs, use boxplots.
Compute correlation coefficients for numerical variables and document any strong correlations observed.

Handling Outliers:

Identify outliers in your dataset using the IQR method or visualization tools.
Decide on an approach to handle these outliers (e.g., remove, replace, or retain) and justify your decision in a markdown cell.
If there are no outliers based on 1, 2, or 3 standard deviations (or z scores >= 1), please state that and support it with your code.

Automated Analysis:

Using the automated EDA tool pandas profiling (e.g., please refer to https://book.datascience.appliedhealthinformatics.com/docs/Ch3/automatic_EDA)
Load in your dataset and analyze it
Save the output (.html) in your report, within a folder called automaticEDA

Please refer to datasets to view the dataset used for this repo. Please refer to the automatedEDA folder to view the automated EDA pandas profiling.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
automaticEDA		automaticEDA
datasets		datasets
README.md		README.md
datasci_3_eda_hha507wk3.ipynb		datasci_3_eda_hha507wk3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datasci_3_eda

About

Releases

Packages

Languages

Helzheng123/datasci_3_eda

Folders and files

Latest commit

History

Repository files navigation

datasci_3_eda

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages