# Python Basics for Data Analysis

Welcome to your first Python notebook! This lesson bridges your Excel knowledge with Python programming, focusing on practical data analysis concepts.

## Learning Objectives
1. Understand Python basics with data analysis context
2. Learn Excel to Python translations
3. Practice with real-world examples

## 1. Python vs Excel: Basic Concepts

Let's start by comparing familiar Excel concepts with their Python equivalents:

| Excel | Python | Description |
|-------|--------|-------------|
| Cell | Variable | Stores a single value |
| Row/Column | List/Array | Stores multiple values |
| Sheet | DataFrame | 2D data structure |
| Formula | Function | Performs calculations |

In [74]:
# Basic Variables (like Excel cells)
revenue = 1000
growth_rate = 0.15
product_name = "Widget"

# Calculate future revenue (like Excel formula)
future_revenue = revenue * (1 + growth_rate)
print(f"Future revenue for {product_name}: ${future_revenue}")

Future revenue for Widget: $1150.0


## 2. Lists and Arrays (like Excel Ranges)

In Excel, you often work with ranges of cells. In Python, we use lists and arrays for similar purposes.

In [75]:
# Creating a list of monthly sales
monthly_sales = [1200, 1350, 1400, 1300, 1500, 1600]

# Calculating total sales (like Excel SUM)
total_sales = sum(monthly_sales)

# Calculating average sales (like Excel AVERAGE)
average_sales = sum(monthly_sales) / len(monthly_sales)

print(f"Total sales: ${total_sales}")
print(f"Average sales: ${average_sales}")

Total sales: $8350
Average sales: $1391.6666666666667


## 3. Control Structures

While Excel uses IF formulas, Python has more flexible control structures.

In [76]:
# Example: Sales Performance Analysis
for month, sales in enumerate(monthly_sales, 1):
    if sales > average_sales:
        performance = "Above average"
    elif sales == average_sales:
        performance = "Average"
    else:
        performance = "Below average"
    
    print(f"Month {month}: ${sales} - {performance}")

Month 1: $1200 - Below average
Month 2: $1350 - Below average
Month 3: $1400 - Above average
Month 4: $1300 - Below average
Month 5: $1500 - Above average
Month 6: $1600 - Above average


## 4. Functions (like Excel Formulas)

Instead of repeating formulas in Excel, Python uses functions for reusable calculations.

In [77]:
def calculate_growth(current_value, previous_value):
    """Calculate percentage growth (like Excel's percentage change)"""
    return (current_value - previous_value) / previous_value * 100

# Calculate month-over-month growth
for i in range(1, len(monthly_sales)):
    growth = calculate_growth(monthly_sales[i], monthly_sales[i-1])
    print(f"Month {i+1} growth: {growth:.1f}%")

Month 2 growth: 12.5%
Month 3 growth: 3.7%
Month 4 growth: -7.1%
Month 5 growth: 15.4%
Month 6 growth: 6.7%


## Practice Exercises

1. Create a list of product prices and calculate:
   - Total revenue
   - Average price
   - Number of products above average price

2. Write a function to calculate profit margin:
   - Input: revenue and cost
   - Output: profit margin percentage

3. Analyze monthly sales data:
   - Find highest and lowest months
   - Calculate quarter-over-quarter growth
   - Identify months with negative growth

In [78]:
import pandas as pd
sales = pd.read_csv('E:\Data Science\gachau_learning\datasets\sales\sample_sales_data.csv')
print(sales.head())

         Date   Product     Category  Quantity  Unit_Price  Total_Sales Region
0  2023-01-01  Widget A  Electronics        50       29.99      1499.50   East
1  2023-01-01  Gadget B  Electronics        30       49.99      1499.70   West
2  2023-01-02    Tool C     Hardware        75       15.99      1199.25  North
3  2023-01-02  Widget A  Electronics        45       29.99      1349.55  South
4  2023-01-03  Gadget B  Electronics        60       49.99      2999.40   East


  sales = pd.read_csv('E:\Data Science\gachau_learning\datasets\sales\sample_sales_data.csv')


In [79]:
widget_a = sales[sales['Product'] == 'Widget A']
print(widget_a)

          Date   Product     Category  Quantity  Unit_Price  Total_Sales  \
0   2023-01-01  Widget A  Electronics        50       29.99      1499.50   
3   2023-01-02  Widget A  Electronics        45       29.99      1349.55   
6   2023-01-04  Widget A  Electronics        80       29.99      2399.20   
9   2023-01-05  Widget A  Electronics        35       29.99      1049.65   
12  2023-01-07  Widget A  Electronics        65       29.99      1949.35   
15  2023-01-08  Widget A  Electronics        30       29.99       899.70   
18  2023-01-10  Widget A  Electronics        70       29.99      2099.30   

   Region  
0    East  
3   South  
6   North  
9    West  
12   East  
15  South  
18  North  


In [80]:
# Calculating the total revenue of Widget A
total_revenue = widget_a['Total_Sales'].sum()


# Calculating the average revenue of Widget A
average_revenue = widget_a['Total_Sales'].mean()

print(f'The total revenue of the Widget A is {total_revenue}')
print(f'The average revenue of the Widget A is {average_revenue}')  

The total revenue of the Widget A is 11246.25
The average revenue of the Widget A is 1606.607142857143


In [81]:
#Number of products above average price
above_average_revenue = widget_a['Total_Sales']>widget_a['Total_Sales'].mean()
print(f'The number of items above average price are {above_average_revenue.sum()}')

The number of items above average price are 3


In [82]:
#Sales Performance Analysis

for index, mauzo in widget_a['Total_Sales'].items() :
    if mauzo > average_revenue:
        performance = "Above average"
    elif mauzo == average_revenue:
        performance = "Average"
    else:
        performance = "Below average"
    
    print(f"Item {index}: ${mauzo} - {performance}")

Item 0: $1499.5 - Below average
Item 3: $1349.55 - Below average
Item 6: $2399.2 - Above average
Item 9: $1049.65 - Below average
Item 12: $1949.35 - Above average
Item 15: $899.7 - Below average
Item 18: $2099.3 - Above average


In [83]:
#Write a function to calculate profit margin:
#Input: revenue and cost
#Output: profit margin percentage

unit_cost = 20.00


#profit margin


In [84]:
#Total cost reduces by 10% for every 10 products
def calculate_cost(quantity):
    batch_size = 10
    discount_rate = 0.1
    unit_cost = 20

    batches = quantity // batch_size
    discount_cost = unit_cost * (1 - discount_rate) * (batches * batch_size)
    remaining_cost = unit_cost * (quantity % batch_size)
    
    return discount_cost + remaining_cost

widget_a['Total_Cost'] = widget_a['Quantity'].apply(calculate_cost)


#Write a function to calculate profit margin:
#Input: revenue and cost
#Output: profit margin percentage

widget_a['Profit_Margin'] = ((widget_a['Total_Sales']/widget_a['Total_Cost'])*100).round(2)

print(widget_a)


          Date   Product     Category  Quantity  Unit_Price  Total_Sales  \
0   2023-01-01  Widget A  Electronics        50       29.99      1499.50   
3   2023-01-02  Widget A  Electronics        45       29.99      1349.55   
6   2023-01-04  Widget A  Electronics        80       29.99      2399.20   
9   2023-01-05  Widget A  Electronics        35       29.99      1049.65   
12  2023-01-07  Widget A  Electronics        65       29.99      1949.35   
15  2023-01-08  Widget A  Electronics        30       29.99       899.70   
18  2023-01-10  Widget A  Electronics        70       29.99      2099.30   

   Region  Total_Cost  Profit_Margin  
0    East       900.0         166.61  
3   South       820.0         164.58  
6   North      1440.0         166.61  
9    West       640.0         164.01  
12   East      1180.0         165.20  
15  South       540.0         166.61  
18  North      1260.0         166.61  


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  widget_a['Total_Cost'] = widget_a['Quantity'].apply(calculate_cost)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  widget_a['Profit_Margin'] = ((widget_a['Total_Sales']/widget_a['Total_Cost'])*100).round(2)


In [85]:
#Write a function to calculate profit margin:
#Input: revenue and cost
#Output: profit margin percentage

def calculate_profit_margin(revenue, cost):
    if cost == 0.00:
        return 0.00
    profit_margin = round(((revenue/cost)*100),2)
    #rounded_profit_margin = round(profit_margin, 2) 
    #remember .round is a pandas function and round is a python function
    return profit_margin
    
#Apply to widget_a
widget_a['Profit_Margin'] = widget_a.apply(lambda row: calculate_profit_margin(row['Total_Sales'], row['Total_Cost']), axis=1)
print(widget_a)

          Date   Product     Category  Quantity  Unit_Price  Total_Sales  \
0   2023-01-01  Widget A  Electronics        50       29.99      1499.50   
3   2023-01-02  Widget A  Electronics        45       29.99      1349.55   
6   2023-01-04  Widget A  Electronics        80       29.99      2399.20   
9   2023-01-05  Widget A  Electronics        35       29.99      1049.65   
12  2023-01-07  Widget A  Electronics        65       29.99      1949.35   
15  2023-01-08  Widget A  Electronics        30       29.99       899.70   
18  2023-01-10  Widget A  Electronics        70       29.99      2099.30   

   Region  Total_Cost  Profit_Margin  
0    East       900.0         166.61  
3   South       820.0         164.58  
6   North      1440.0         166.61  
9    West       640.0         164.01  
12   East      1180.0         165.20  
15  South       540.0         166.61  
18  North      1260.0         166.61  


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  widget_a['Profit_Margin'] = widget_a.apply(lambda row: calculate_profit_margin(row['Total_Sales'], row['Total_Cost']), axis=1)
