# Working with Tabular Data

Lesson 12 - CSV files, load, read csv, manipulate within dictionaries stored as elements in a list, filter data based on criteria, leverage LLMs to suggest trip activities using extracted data.

Let's learn about csv files that structure data into rows and columns (tabular data yes!).

Text files are great but sometimes you need a bit more organization and structure, that's where csv files come into play.

Imagine you have a bunch of information about customer tickets organized in a .csv file that you would like to understand a bit more about.

In [3]:
# Super popular library for working with tabular data
import pandas as pd
from ai_tools import ask_ai

In [5]:
data_customer_tickets = pd.read_csv("./extracted_ticket_issues.csv")

data_customer_tickets

Unnamed: 0,customer_name,issue_description,priority
0,Jane Doe,Customer was charged twice for the same transa...,High
1,John Smith,"Customer unable to log into their account, fac...",Medium
2,Alice Johnson,Customer wants more information about product ...,Low
3,Bob Brown,"Customer has not received the order yet, track...",High
4,Michael Lee,Customer wants to return a product and needs a...,Medium


The data contains 3 columns:
1. `customer_name` - names of the customers
2. `issue_description` - description of the issue they had
3. `priority` - reference to the level of priority of that task

We could use Python to get for example only the high priority issues:

In [13]:
high_priority_issues = data_customer_tickets[data_customer_tickets["priority"]=="High"]

Now we can take a look at the issues themselves:

In [15]:
high_priority_issues

Unnamed: 0,customer_name,issue_description,priority
0,Jane Doe,Customer was charged twice for the same transa...,High
3,Bob Brown,"Customer has not received the order yet, track...",High


Awesome! What we could do now is for example use our `ask_ai` tool to categorize the issues for us to help organizing the information, and then feed that back into the table:

In [18]:
categories_list = []
for issue in high_priority_issues["issue_description"]:
    print(f"Categorizing issue: {issue}")
    category = ask_ai(f"Categorize this issue in just one single word and OUTPUT ONLY THAT WORD:\n\n issue: {issue}\n category: \n")
    print(f"Category: {category}")
    categories_list.append(category)


Categorizing issue: Customer was charged twice for the same transaction.
Category: Billing
Categorizing issue: Customer has not received the order yet, tracking information shows a delay.
Category: Shipping


Notice we use concepts we've learned before by looping over the issues, saving them to a list.

Now with that information in hand we can actually update the dataframe accordingly, first we create a new column in the dataframe:

In [19]:
data_customer_tickets["issue_category"] = None

In [20]:
# Update categories for high priority issues using the index from high_priority_issues
for idx, category in zip(high_priority_issues.index, categories_list):
    data_customer_tickets.loc[idx, "issue_category"] = category

In [21]:
data_customer_tickets

Unnamed: 0,customer_name,issue_description,priority,issue_category
0,Jane Doe,Customer was charged twice for the same transa...,High,Billing
1,John Smith,"Customer unable to log into their account, fac...",Medium,
2,Alice Johnson,Customer wants more information about product ...,Low,
3,Bob Brown,"Customer has not received the order yet, track...",High,Shipping
4,Michael Lee,Customer wants to return a product and needs a...,Medium,


Notice that the issues for which we did not analyse still contain a `None` indicating they haven't been categorized yet!

Besides analysing data, we can also create our ownn tables with information we care about.
Let's organize a trip and save all the necessary information inside a table. 

Let's organize a gear checklist for our trip.

In [1]:
gear_check_list = [
    "Tent",
    "Sleeping bag", 
    "Backpack",
    "Hiking boots",
    "Water bottle",
    "First aid kit",
    "Flashlight",
    "Map and compass",
    "Rain jacket",
    "Extra clothes"
]

travel_check_list_table = {
    "gear": gear_check_list,
    "checked": [False] * len(gear_check_list),
    "price": [200, 100, 80, 120, 20, 40, 25, 30, 60, 100]
}

In [4]:
table_travel = pd.DataFrame(travel_check_list_table)

table_travel

Unnamed: 0,gear,checked,price
0,Tent,False,200
1,Sleeping bag,False,100
2,Backpack,False,80
3,Hiking boots,False,120
4,Water bottle,False,20
5,First aid kit,False,40
6,Flashlight,False,25
7,Map and compass,False,30
8,Rain jacket,False,60
9,Extra clothes,False,100
