In [1]:
!pip install --upgrade pandas
import pandas as pd
from google.colab import files
uploaded = files.upload()

### 1. Reading Data from CSV with Specific Features
When reading CSV files, we often need to handle special cases:
- Different delimiters (here we use ';')
- Potential encoding issues
- Handling of missing values
- Parsing dates correctly

In [13]:
df = pd.read_csv('session_03_data_practice.csv', delimiter=';', parse_dates=['Creation Date', 'Date when pay'])

**Note: The parse_dates parameter attempts to convert specified columns to datetime objects**  
This is especially useful for time series analysis

### 2. Initial Data Exploration
Understanding your data is the first critical step in any analysis

**View first 5 rows to get a quick look at the data structure**

**View last 5 rows to check if data is consistent throughout**

**Get basic information about the DataFrame:**
- Number of non-null entries per column
- Data types of each column
- Memory usage

**Generate descriptive statistics for numeric columns:**
- Count, mean, std, min, quartiles, max

**Get DataFrame dimensions (rows, columns)**

**View column names (important for referencing columns correctly)**

**Initial observations we might make:**
- Mixed data types (numeric, text, dates)
- Some columns have missing values (like 'Date when pay')
- Potential data quality issues (spaces in city names, inconsistent capitalization)
- Numeric columns like 'Money amount' have wide ranges

### 3. Renaming Columns
**Column renaming is important for:**
- Consistency (standard naming conventions)
- Readability (clear, descriptive names)
- Ease of use (avoid spaces/special characters in names)

**Verify the changes**

### 4. Working with Columns and Rows

**Selecting Columns**
- Single column (returns Series)

**Multiple columns (returns DataFrame)**

**Selecting Rows**
- By index

- By position (iloc)

**Filtering Data**
- Completed transactions only

- High-value transactions (> 20,000)

- Multiple conditions (use & for AND, | for OR)

- Filter by string contains (case sensitive)

### 5. Basic Data Operations

**Adding Columns**
- Calculate payment delay in days (for completed transactions)

**Create a binary column indicating high-value transactions**

**Removing Columns**
- Drop the temporary column we created

**Sorting Data**
- Sort by amount (descending)

- Multi-column sort

### Additional Important Initial Analysis Steps
**Checking for Missing Values**

**Examining Unique Values**

**Data Quality Checks**
- Check for inconsistent city names (case sensitivity, whitespace)

- Value Counts (useful for categorical data)

### Initial Observations:

1. **Data Structure Preview**:
   - The dataset contains transaction records with columns like `Number`, `Creation Date`, `Payment Date`, `Title`, `Status`, `Money Amount`, `City`, and `Payment System`.
   - Mixed data types are visible at a glance (dates, numbers, text).

2. **Missing Values**:
   - `Date when pay` is empty for canceled transactions (e.g., rows 3-7 in head).
   - `Money amount` shows `0` for canceled transactions.
   - Some `City` and `Payment System` fields are empty (e.g., row 11 in head).

3. **Status Patterns**:
   - `Completed` status correlates with:
     - Filled payment dates
     - Positive money amounts
     - Specified payment methods
   - `Canceled` status shows:
     - Empty payment dates
     - Zero amounts
     - Often missing payment methods

4. **Data Quality Indicators**:
   - Inconsistent city name formatting:
     - Mixed case (e.g., "TERNOPIL" vs "Vinnytsia")
     - Trailing spaces (e.g., " Lviv" vs "Lviv")
     - Combined city names ("OdesaDnipro")
   - Payment method names use different languages (Ukrainian and English)

5. **Temporal Patterns**:
   - Transactions span December 2024 (based on creation dates)
   - Payment delays vary (e.g., row 1 shows 2-minute delay, while row 14 shows 17-day delay)

6. **Business Context Clues**:
   - Course titles suggest an educational platform ("AI Engineering", "Frontend Development", etc.)
   - Payment amounts vary significantly (from 1.0 to 42750.0)
   - Multiple payment systems are supported (Google Pay, PayPal, bank transfers)

7. **Potential Anomalies**:
   - Very small payments (1.0) in rows 28-34 might represent test transactions
   - Some payment dates (e.g., "01.01.2020" in last row) appear inconsistent with creation dates

**Key Questions for Further Investigation**:
- Why do some completed transactions have payment dates before creation dates?
- Should city names be standardized (case, spacing)?
- Are the 1.0 payments valid or system artifacts?
- What explains the extreme payment amount range?

This initial inspection reveals both the dataset's structure and immediate data quality considerations that would need addressing before deeper analysis.