Load a comma separated file (CSV file) into a DataFrame:

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('Sales_August_2019.csv')

Viewing the Data

One of the most used method for getting a quick overview of the DataFrame, is the **head()** method.

The **head()** method returns the headers and a specified number of rows, starting from the top.

Print the first 5 rows of the DataFrame:



In [None]:
df.head()

Get a quick overview by printing the first 10 rows of the DataFrame:

In [None]:
df.head(10)

There is also a **tail()**method for viewing the last rows of the DataFrame.

The **tail()** method returns the headers and a specified number of rows, starting from the bottom.


In [None]:
df.tail()

The DataFrames object has a method called **info()**, that gives you more information about the data set.

In [None]:
df.info()

The **info()** method also tells us how many **Non-Null** values there are present in each column.

**Empty** values, or **Null** values, can be bad when analyzing data, and you should consider removing rows with empty values. This is a step towards what is called cleaning data.


 Generates descriptive statistics for numerical columns (e.g., mean, median, standard deviation).

Usage:
```python
df.describe()
```

In [None]:
df.describe()

Counts unique values in a column, helpful for categorical data.

Usage:

```python
df['column_name'].value_counts()
```


In [None]:
df['Product'].value_counts()

Detects missing values in the dataset.

Usage:
```python
df.isnull().sum()
```

In [None]:
df.isnull().sum()

# Assignment 2: Basic Data Exploration Using Pandas

## Objective
This assignment aims to help students practice basic data exploration techniques using Pandas through loading data, exploring structure, checking for missing values, and performing preliminary analysis.

## Instructions

### Setting Up Google Colab
1. Open Google Colab and create a new notebook titled "Basic Data Exploration with Pandas"
2. Make sure you are connected to a Python runtime

---

## Part 1: Data Loading

**Task 1**: Import necessary libraries and load data from the provided URL

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load data from URL
url = 'https://github.com/9meo/bas240/raw/main/data/Sales_October_2019.csv'
df = pd._____(url)  # Fill in the correct command
```

**Task 2**: Display the first 10 rows and shape of the data

```python
# Display first 10 rows
print("First 10 rows of data:")
df._____(10)  # Fill in the correct command

# Display data dimensions (number of rows, columns)
print(f"\nData shape: {df._____}")  # Fill in the correct command
```

---

## Part 2: Data Structure Exploration

**Task 3**: Use the info() function to view overall dataset information

```python
print("Dataset overview:")
df._____()  # Fill in the correct command
```

**Task 4**: Generate descriptive statistics for numerical data

```python
print("Descriptive statistics:")
df._____()  # Fill in the correct command
```

**Task 5**: Display all column names and data types

```python
print("Column names:")
print(df._____)  # Fill in the correct command

print("\nData types for each column:")
print(df._____)  # Fill in the correct command
```

---

## Part 3: Missing Values Check

**Task 6**: Check for missing values in each column

```python
print("Number of missing values in each column:")
missing_values = df._____.___()  # Fill in the correct command
print(missing_values)

# Calculate percentage of missing values
print("\nPercentage of missing data:")
missing_percentage = (missing_values / len(df)) * 100
print(missing_percentage.round(2))
```

---

## Part 4: Categorical Data Analysis

**Task 7**: Analyze Product data

```python
print("Count of each product type:")
product_counts = df['Product']._____()  # Fill in the correct command
print(product_counts)

print(f"\nTotal number of different products: {len(product_counts)} types")
print(f"Best-selling product: {product_counts.index[0]}")
print(f"Quantity sold: {product_counts.iloc[0]} units")
```

**Task 8**: Analyze City data (if available)

```python
# Check if City column exists
if 'City' in df.columns:
    print("Sales count by city:")
    city_counts = df['City']._____()  # Fill in the correct command
    print(city_counts.head(10))  # Show only top 10 cities
else:
    print("City column not found in the dataset")
```

---

## Part 5: Basic Analysis

**Task 9**: Calculate total sales (if price columns exist)

```python
# Check for price-related columns
price_columns = [col for col in df.columns if 'price' in col.lower() or 'amount' in col.lower()]
print(f"Price-related columns: {price_columns}")

# If price columns exist, calculate statistics
if price_columns:
    for col in price_columns:
        if df[col].dtype in ['float64', 'int64']:
            print(f"\nStatistics for {col}:")
            print(f"- Mean: ${df[col].mean():.2f}")
            print(f"- Median: ${df[col].median():.2f}")
            print(f"- Maximum: ${df[col].max():.2f}")
            print(f"- Minimum: ${df[col].min():.2f}")
```

**Task 10**: Create basic visualizations

```python
# Bar chart showing top 10 best-selling products
plt.figure(figsize=(12, 6))
top_products = df['Product'].value_counts().head(10)
top_products.plot(kind='bar')
plt.title('Top 10 Best-Selling Products')
plt.xlabel('Product')
plt.ylabel('Quantity Sold')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
```

---

## Part 6: Analysis Summary

**Task 11**: Write an analysis summary

```python
print("=== Data Analysis Summary ===")
print(f"1. Dataset contains {df.shape[0]} rows and {df.shape[1]} columns")
print(f"2. Number of different product types: {df['Product'].nunique()} types")
print(f"3. Missing data points: {df.isnull().sum().sum()} values")

# Write your additional summary here
```

---

## Submission

1. **Save your work**: Save the notebook to your Google Drive
2. **Check your work**: Ensure all code cells run correctly
3. **Download**: Download the file as .ipynb
4. **Submit**: Submit the .ipynb file through the course platform

---

## Grading Criteria

- **Correct data loading**: 15 points
- **Proper use of info(), describe(), head() functions**: 20 points  
- **Missing values check**: 15 points
- **Categorical data analysis**: 20 points
- **Visualization and additional analysis**: 20 points
- **Summary and code correctness**: 10 points

**Total**: 100 points

---

## Tips

1. Read error messages carefully if code doesn't work
2. Use `print()` statements to display results clearly
3. Write comments in your code to explain what you're doing
4. Try experimenting with additional Pandas functions