### __Jr Data Analysis Tasks 01__

#### 📁 _Data Ingestion & Preprocessing_

##### CSV Import with Custom Header and Separator
        
- Load CSV

- Prepare it for a DataFrame

##### Handling Missing Values

- Read an Excel file where missing values are represented as "N/A" or "-".

- Use keep_default_na=False to interpret these manually, then use isna() to count nulls per column.

##### Fill Missing Sales Values

- Replace all missing values in the Sales column with the median of existing sales.

##### Remove Empty Rows

- Drop rows where all values are missing.

##### Drop Columns with Excessive Missing Data

- Drop any column with more than 60% missing values.

##### Check and Drop Duplicates

- Identify and drop fully duplicated rows.

##### Drop Partial Duplicates

- Drop duplicates only considering the columns ['Customer_ID', 'Invoice_ID'].

##### Import with Specific Decimal

- Read a CSV where the decimal separator is a comma , instead of dot ..

##### Custom Null Markers

- Read a file where empty cells are marked as "Unknown", "NA", or "Missing".

##### Validate Column Uniqueness

- Use .nunique() and .duplicated() to validate if a supposed ID column is truly unique.

#### 📊 _Analysis and Aggregation_

##### Frequency Analysis

Use .value_counts() to find the most common product sold per region.

##### Group Aggregation by Region

Group by "Region" and compute:

- Total Sales

- Mean Profit

- Max Sales

##### Multi-Level Aggregation

- Group by both "Region" and "Product" to compute sum of "Sales" and count of "Invoice_ID".

##### Sort Aggregated Data

- After groupby-agg, sort the results by "Total Sales" descending.

##### String Cleaning with .str accessor

- Use .str.strip(), .str.lower(), .str.replace() to clean customer names.

##### Filtering with Sets

- From a set of known loyal customers (set()), filter rows where the Customer_ID is part of that set.

##### Categorize with Dictionary Mapping

- Map the "Currency" column using a dictionary like {'USD': 'Dollar', 'EUR': 'Euro'}.

##### Top N Products

- Use .value_counts() and slicing to find the top 5 most frequent products.

##### Percentage of Nulls

- Create a dictionary with column names as keys and percentage of missing values as values.

##### Filter and Aggregate in One Line

- Use a one-liner (with comprehension if needed) to filter all rows where Sales > 1000 and group by "Region" to get the average profit.

#### 🧠 _Advanced Manipulations_

##### List Comprehension for Filtered Rows

- Use a comprehension to create a list of Invoice_ID for which "Profit" is negative and "Sales" is above average.

##### Create a Dictionary of Unique Values per Column

- Create a dictionary where each key is a column and the value is the list of its unique values.

##### Pivot Analysis

- Pivot the data to show Product as rows and Region as columns with total sales.

##### Tuple of Column Summary

- Create a tuple that stores (column_name, min_value, max_value) for numeric columns.

##### String Extraction

- Use .str.extract() to pull out numbers from product descriptions like "Item 1234 - ABC".

##### Customer Retention Insight

- From a sorted DataFrame of dates, identify customers with more than one transaction in different months (use sets and groupby()).

##### Sorting with Multiple Criteria

- Sort the data by Region ascending and Sales descending.

##### Nested Dictionary for Aggregation

- Use groupby().agg() with a dictionary like: {'Sales': ['sum', 'mean'], 'Profit': ['min', 'max']}.

##### Create a Clean Subset

Generate a clean subset of your dataset excluding rows with:

- Missing values in any numeric columns

- Duplicate Customer_ID

##### Export Clean Data

- After applying all cleaning steps, export the clean DataFrame to Excel, one sheet for each Region.