This repo contains jupyter notebooks with the code of two Data Visualization Courses.
url => https://www.coursera.org/learn/python-plotting A course about matplotlib and the various types of charts that are available in the matplotlib framework
url => https://www.kaggle.com/learn/data-visualization A course that overviews the most common charts using the Seaborn package
url => https://www.coursera.org/learn/data-analysis-with-python - week 3 Simple EDA techniques that give good insights about the dataset
Steps:
1. Identification of variables, data types and shape of the dataset
- Numerical => Discrete or Continuous
- Categorical => Ordinal or Nominal
2. Analyzins basic metrics => Statistical Summary
3. Non-Graphical Univariate Analysis
- Get the count and list of unique values
- Filtering based on coditions + grouping
4. Graphical Univariate Analysis
- Analyzing individual feature patterns using visualization
- Histograms (numeric features), box plots (categorical features), count plots
5. Bivariate Analysis
- Identifying relationships between features
- Scatter plots (regplot - seaborn), boxplots (x being a categorical feature (labels) and y being numeric (for example the target)), heatmaps
- sns.countplot() - compare the distrution of two categorical columns => sex vs have/no disease
6. Analyzing outliers and missing values
7. Correlation Analysis
- Regplots (seaborn)
- Correlation matrix
- Check redundant variables