## Understanding the Categories of your Data

Data can be divided into two categories, categorical (qualitative) and numerical (quantitative). These two categories can be split into two new categories, each. 

**Nominal Data**: Nominal data represents categories without any intrinsic ordering or ranking. It is used for labeling or naming attributes, where the numbers or labels are merely identifiers and do not imply a quantitative value or order. The focus is on "naming" or "categorizing" without an implied sequence.

**Ordinal Data**: Ordinal data represents categories with a clear, inherent order, but the intervals between the categories are not necessarily equal or known. Ordinal data indicates the position or rank of items in a sequence, where the relative ordering is significant, but the exact difference between ranks is not defined.

**Discrete Data**: Discrete data consists of distinct, separate values where each value can be counted. It represents countable items and often involves integers. The key aspect of discrete data is the presence of a finite number of values between any two points.

**Continuous Data**: Continuous data represents measurements and can take on any value within a given range. The range can be infinite, and the values are often represented by fractions or decimals. Continuous data can be subdivided infinitely, meaning that between any two values, there are potentially infinite other values.

# Exploratory Data Analysis | EDA

Exploratory Data Analysis (EDA) aims to uncover underlying patterns, spot anomalies, test hypotheses, and check assumptions through a detailed examination of the dataset. By performing EDA, we mitigate the risk of misleading or inaccurate results, leading to more effective and informed decision-making in various fields, from business to scientific research.<br>


**1. Understanding the Dataset**<br>
Initial Data Inspection: Familiarize yourself with the dataset. What is the nature of the data? time series, categorical, continuous etc. Make sure you understand the columns, and understand their types e.g. nominal, ordinal, interval, ratio.
<br>

**2. Data Cleaning and Preprocessing (We will learn about this step later in the Bootcamp)**<br>
Identify missing values and decide on strategies like imputation or removal.
Data Type Conversion - Ensure that each variable is of the correct type e.g. int, float, object or datetime. Check for and remove any duplicate records.
<br>

**3. Univariate Analysis**<br>
Categorical Data: Use bar charts to understand the distribution.
Continuous Data: Apply histograms, box plots, or density plots to understand the distribution and identify any outliers.
<br>

**4. Bivariate/Multivariate Analysis**<br>
Visualize correlation matrices as a heatmap  to identify relationships between numerical variables. Use scatter plots or pair plots for visualizing relationships between pairs of continuous variables.

Use `groupby()` and double `groupby()` + `unstack()` for categorical variables. Visualize as `sns.barplots()` and  `sns.barplots()` with a `hue=`. Double `groupby()` + `unstack()` can be visualized as heatmaps as well.

Split up continuous variables across a categorical column to visualize and compare the distributions.
<br>



## Familiarize yourself with the data + Heatmaps for Missing Values

Heatmaps provide a visual representation that is often more intuitive and easier to understand at a glance. We can quickly identify patterns of missing data. For example, you can easily spot if missing values are randomly distributed across the dataset or concentrated in specific rows or columns. Large areas of missing data might indicate problems in data collection or processing. In numerical summaries the patterns and extents of missing data might be overlooked, but the heatmap makes them immediately apperent.


By understanding where and how data is missing, we can make more informed decisions about appropriate methods for data imputation or whether to exclude certain data points or features.

## Make use of the Majestic `sns.catplot()` to quickly get insights from the data!

Options include:

`kind=` "strip", "swarm", "box", "violin",
    "boxen", "point", "bar", or "count"
    
Play around with: `hue`, `rows`, `cols`