# Getting Started with Python Data Science

Welcome! This notebook will help you get started with data analysis in Python.

## What You'll Learn
1. Working with pandas DataFrames
2. Basic data analysis
3. Creating visualizations
4. Reading Excel files

Let's begin!

## 1. Import Libraries

First, let's import the libraries we'll use:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set display options
pd.set_option('display.max_columns', None)
print("âœ“ Libraries imported successfully!")

## 2. Create Your First DataFrame

A DataFrame is like an Excel spreadsheet in Python:

In [None]:
# Create sample sales data
data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'Sales': [1000, 1500, 1200, 1800, 2000, 1700],
    'Expenses': [800, 900, 850, 1000, 1100, 950],
    'Region': ['East', 'East', 'West', 'West', 'East', 'West']
}

df = pd.DataFrame(data)
print("Your DataFrame:")
df

## 3. Basic Data Analysis

Let's analyze the data:

In [None]:
# Calculate profit
df['Profit'] = df['Sales'] - df['Expenses']

# Basic statistics
print("Summary Statistics:")
print(f"Total Sales: ${df['Sales'].sum():,}")
print(f"Average Sales: ${df['Sales'].mean():.2f}")
print(f"Total Profit: ${df['Profit'].sum():,}")
print(f"Profit Margin: {(df['Profit'].sum() / df['Sales'].sum() * 100):.1f}%")

In [None]:
# Group by region
print("\nSales by Region:")
df.groupby('Region')[['Sales', 'Profit']].sum()

## 4. Create Visualizations

Visualize your data:

In [None]:
# Sales trend
plt.figure(figsize=(10, 5))
plt.plot(df['Month'], df['Sales'], marker='o', linewidth=2, markersize=8)
plt.title('Monthly Sales Trend', fontsize=14, fontweight='bold')
plt.xlabel('Month')
plt.ylabel('Sales ($)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Sales vs Expenses
fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(df['Month']))
width = 0.35

ax.bar(x - width/2, df['Sales'], width, label='Sales', color='#2E86AB')
ax.bar(x + width/2, df['Expenses'], width, label='Expenses', color='#A23B72')

ax.set_xlabel('Month')
ax.set_ylabel('Amount ($)')
ax.set_title('Sales vs Expenses Comparison', fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(df['Month'])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## 5. Working with Excel Files

Save your data to Excel:

In [None]:
# Save to Excel
output_path = '../outputs/sales_analysis.xlsx'
df.to_excel(output_path, index=False, sheet_name='Sales Data')
print(f"âœ“ Data saved to {output_path}")

# You can also read it back:
# df_loaded = pd.read_excel(output_path)
# print(df_loaded.head())

## 6. Next Steps

Try these exercises:

1. **Add more data**: Add more months to the DataFrame
2. **Filter data**: Find months where Sales > 1500
3. **Create new calculations**: Calculate a running total of sales
4. **Use your own data**: Replace this sample data with your own Excel file

### Example: Filter data
```python
high_sales = df[df['Sales'] > 1500]
print(high_sales)
```

### Example: Read your Excel file
```python
my_data = pd.read_excel('../data/raw/your_file.xlsx')
print(my_data.head())
```

### Resources
- **GETTING_STARTED.md**: Complete setup guide
- **DEPENDENCY_QUICK_REFERENCE.md**: Common commands
- **Pandas Documentation**: https://pandas.pydata.org/docs/

Happy analyzing! ðŸ“Š