## Task 1 — Books Data Analysis

You have book data with the following fields:
**Title**, **Author**, **Publication Year**, **Price**.

### Objectives:
- Create a DataFrame with **at least six rows**.
- Display the **entire DataFrame**.
- Calculate the **average price** of the books.
- Find the books **published after 2015**.
- Sort the DataFrame by **price in ascending order**.


In [1]:
import pandas as pd

books = pd.DataFrame({
    "Title": ["Clean Code", "Deep Learning", "Python Crash Course", "The Pragmatic Programmer", "Fluent Python", "Hands-On ML"],
    "Author": ["Robert C. Martin", "Ian Goodfellow", "Eric Matthes", "Andrew Hunt", "Luciano Ramalho", "Aurélien Géron"],
    "Publication Year": [2008, 2016, 2019, 1999, 2015, 2017],
    "Price": [28.50, 45.00, 32.00, 30.00, 40.00, 50.00]
})


In [7]:
print("Full DataFrame:")
print(books)





Full DataFrame:
                      Title            Author  Publication Year  Price
0                Clean Code  Robert C. Martin              2008   28.5
1             Deep Learning    Ian Goodfellow              2016   45.0
2       Python Crash Course      Eric Matthes              2019   32.0
3  The Pragmatic Programmer       Andrew Hunt              1999   30.0
4             Fluent Python   Luciano Ramalho              2015   40.0
5               Hands-On ML    Aurélien Géron              2017   50.0


In [8]:
avg_price = books["Price"].mean()
print("\nAverage book price:", avg_price)
after_2015 = books[books["Publication Year"] > 2015]


Average book price: 37.583333333333336


In [9]:
print("\nBooks published after 2015:")
print(after_2015)


Books published after 2015:
                 Title          Author  Publication Year  Price
1        Deep Learning  Ian Goodfellow              2016   45.0
2  Python Crash Course    Eric Matthes              2019   32.0
5          Hands-On ML  Aurélien Géron              2017   50.0


## Task 2 — Orders Data Analysis

A CSV file contains order data with the following fields:
**Order Number**, **Client**, **Date**, **Amount**.

### Objectives:
- Read the data from the CSV file into a DataFrame.
- Display the **first ten rows** of the DataFrame.
- Determine the **number of orders for each client**.
- Find the **maximum and minimum order amounts**.
- Calculate the **total amount of all orders**.

In [10]:
import pandas as pd

orders = pd.read_csv("orders.csv")


In [11]:
print("First 10 rows of the DataFrame:")
print(orders.head(10))


First 10 rows of the DataFrame:
   OrderNumber   Client        Date  Amount
0         1001    Alice  2024-01-10     250
1         1002      Bob  2024-01-11     180
2         1003    Alice  2024-01-12     320
3         1004  Charlie  2024-01-13     150
4         1005      Bob  2024-01-14     400
5         1006    Alice  2024-01-15     210
6         1007  Charlie  2024-01-16     500
7         1008      Bob  2024-01-17     130
8         1009    Alice  2024-01-18     275
9         1010  Charlie  2024-01-19     350


In [12]:
orders_per_client = orders["Client"].value_counts()
print("Number of orders per client:")
print(orders_per_client)


Number of orders per client:
Client
Alice      5
Bob        3
Charlie    3
Name: count, dtype: int64


In [13]:
max_amount = orders["Amount"].max()
min_amount = orders["Amount"].min()

print("Maximum order amount:", max_amount)
print("Minimum order amount:", min_amount)


Maximum order amount: 500
Minimum order amount: 130


In [14]:
total_amount = orders["Amount"].sum()
print("Total amount of all orders:", total_amount)


Total amount of all orders: 2955


## Task 3 — Food Products Data Analysis

You have a table with food product data containing the following fields:
**Product**, **Category**, **Calories**, **Protein**.

### Objectives:
- Create a DataFrame with **at least ten rows**.
- Display the **entire DataFrame**.
- Find all products with **calorie content greater than 300**.
- Calculate the **average protein content by category**.
- Sort the DataFrame by **calories in descending order**.


In [15]:
import pandas as pd

food = pd.DataFrame({
    "Product": [
        "Pizza", "Burger", "Salad", "Pasta", "Steak",
        "Soup", "Cake", "Fish", "Rice", "Chicken"
    ],
    "Category": [
        "Fast Food", "Fast Food", "Healthy", "Fast Food", "Meat",
        "Healthy", "Dessert", "Seafood", "Healthy", "Meat"
    ],
    "Calories": [400, 350, 150, 320, 500, 180, 450, 300, 280, 330],
    "Protein": [12, 15, 5, 10, 40, 8, 6, 22, 7, 35]
})


In [16]:
print("Full DataFrame:")
print(food)


Full DataFrame:
   Product   Category  Calories  Protein
0    Pizza  Fast Food       400       12
1   Burger  Fast Food       350       15
2    Salad    Healthy       150        5
3    Pasta  Fast Food       320       10
4    Steak       Meat       500       40
5     Soup    Healthy       180        8
6     Cake    Dessert       450        6
7     Fish    Seafood       300       22
8     Rice    Healthy       280        7
9  Chicken       Meat       330       35


In [17]:
high_calorie = food[food["Calories"] > 300]
print("Products with calories greater than 300:")
print(high_calorie)


Products with calories greater than 300:
   Product   Category  Calories  Protein
0    Pizza  Fast Food       400       12
1   Burger  Fast Food       350       15
3    Pasta  Fast Food       320       10
4    Steak       Meat       500       40
6     Cake    Dessert       450        6
9  Chicken       Meat       330       35


In [18]:
avg_protein = food.groupby("Category")["Protein"].mean()
print("Average protein content by category:")
print(avg_protein)


Average protein content by category:
Category
Dessert       6.000000
Fast Food    12.333333
Healthy       6.666667
Meat         37.500000
Seafood      22.000000
Name: Protein, dtype: float64


In [19]:
sorted_food = food.sort_values(by="Calories", ascending=False)
print("Sorted by calories (descending):")
print(sorted_food)


Sorted by calories (descending):
   Product   Category  Calories  Protein
4    Steak       Meat       500       40
6     Cake    Dessert       450        6
0    Pizza  Fast Food       400       12
1   Burger  Fast Food       350       15
9  Chicken       Meat       330       35
3    Pasta  Fast Food       320       10
7     Fish    Seafood       300       22
8     Rice    Healthy       280        7
5     Soup    Healthy       180        8
2    Salad    Healthy       150        5


## Task 4 — Employees and Projects Analysis

Employee project data is collected with the following fields:
**Name**, **Project**, **Hours**.

### Objectives:
- Create a DataFrame with **at least eight rows**.
- Display the **original DataFrame**.
- Calculate the **total number of hours per employee**.
- Calculate the **total number of hours per project**.
- Identify the **employee who spent the most hours**.

In [21]:
import pandas as pd

employees = pd.DataFrame({
    "Name": [
        "Alice", "Bob", "Charlie", "Alice",
        "Bob", "Charlie", "Alice", "Bob"
    ],
    "Project": [
        "Project A", "Project A", "Project A",
        "Project B", "Project B", "Project B",
        "Project C", "Project C"
    ],
    "Hours": [10, 8, 12, 15, 20, 10, 18, 14]
})


In [22]:
print("Original DataFrame:")
print(employees)


Original DataFrame:
      Name    Project  Hours
0    Alice  Project A     10
1      Bob  Project A      8
2  Charlie  Project A     12
3    Alice  Project B     15
4      Bob  Project B     20
5  Charlie  Project B     10
6    Alice  Project C     18
7      Bob  Project C     14


In [23]:
hours_per_employee = employees.groupby("Name")["Hours"].sum()
print("Total hours per employee:")
print(hours_per_employee)


Total hours per employee:
Name
Alice      43
Bob        42
Charlie    22
Name: Hours, dtype: int64


In [24]:
hours_per_project = employees.groupby("Project")["Hours"].sum()
print("Total hours per project:")
print(hours_per_project)


Total hours per project:
Project
Project A    30
Project B    45
Project C    32
Name: Hours, dtype: int64


In [25]:
top_employee = hours_per_employee.idxmax()
print("Employee with the most hours:", top_employee)


Employee with the most hours: Alice


## Task 5 — Ticket Sales Analysis

You have a table containing ticket sales data with the following fields:
**Movie**, **City**, **Tickets Sold**.

### Objectives:
- Create a DataFrame with **at least twelve rows**.
- Display the **entire DataFrame**.
- Calculate the **total number of tickets sold for each movie**.
- Calculate the **total number of tickets sold for each city**.
- Identify the **movie with the highest number of ticket sales**.


In [26]:
import pandas as pd

tickets = pd.DataFrame({
    "Movie": [
        "Movie A", "Movie A", "Movie B", "Movie B",
        "Movie C", "Movie C", "Movie D", "Movie D",
        "Movie E", "Movie E", "Movie F", "Movie F"
    ],
    "City": [
        "New York", "Los Angeles", "New York", "Los Angeles",
        "New York", "Los Angeles", "New York", "Los Angeles",
        "New York", "Los Angeles", "New York", "Los Angeles"
    ],
    "Tickets Sold": [120, 150, 200, 180, 90, 110, 300, 250, 160, 170, 220, 210]
})


In [27]:
print("Full DataFrame:")
print(tickets)


Full DataFrame:
      Movie         City  Tickets Sold
0   Movie A     New York           120
1   Movie A  Los Angeles           150
2   Movie B     New York           200
3   Movie B  Los Angeles           180
4   Movie C     New York            90
5   Movie C  Los Angeles           110
6   Movie D     New York           300
7   Movie D  Los Angeles           250
8   Movie E     New York           160
9   Movie E  Los Angeles           170
10  Movie F     New York           220
11  Movie F  Los Angeles           210


In [28]:
tickets_per_movie = tickets.groupby("Movie")["Tickets Sold"].sum()
print("Total tickets sold per movie:")
print(tickets_per_movie)


Total tickets sold per movie:
Movie
Movie A    270
Movie B    380
Movie C    200
Movie D    550
Movie E    330
Movie F    430
Name: Tickets Sold, dtype: int64


In [29]:
tickets_per_city = tickets.groupby("City")["Tickets Sold"].sum()
print("Total tickets sold per city:")
print(tickets_per_city)


Total tickets sold per city:
City
Los Angeles    1070
New York       1090
Name: Tickets Sold, dtype: int64


In [None]:
top_movie = tickets_per_movie.idxmax()
print("Movie with the highest ticket sales:", top_movie)
