### Problem Overview:

You are given a list of dictionaries. Each dictionary represents a sales transaction and contains three keys: 'id', 'product', and 'price'. 'id' represents the id of a transaction, 'product' represents the name of the product, and 'price' is the price of the product.

Your task is to write a function that filters out transactions with a price greater than a given threshold and returns a new list of dictionaries with the remaining transactions.

The function should return a new list of dictionaries. Each dictionary should contain the 'id', 'product', and 'price' for each transaction.

### Libraries Needed:

None

### Inputs:

The function will take the following inputs:

A list of dictionaries 'transactions'. Each dictionary contains three keys: 'id', 'product', and 'price'. 'id' is a string that represents the id of a transaction, 'product' is a string that represents the name of the product, and 'price' is a float that represents the price of the product.
A float 'threshold'. This represents the maximum price for a transaction to be included in the output.


### Expected Outputs:

The function should return a list of dictionaries. Each dictionary should contain three keys: 'id', 'product', and 'price'. The list should only include transactions where the 'price' is less than or equal to 'threshold'.

### Data:

In [1]:
transactions = [
    {"id": "t1", "product": "apple", "price": 1.0},
    {"id": "t2", "product": "banana", "price": 0.5},
    {"id": "t3", "product": "cherry", "price": 2.0},
]

In [14]:
def filter_under_price(transactions, thresh_price):
    filtered_transactions = list(filter(lambda x: x["price"] <= thresh_price, transactions))
    return filtered_transactions

In [15]:
filter_under_price(transactions, 1)

[{'id': 't1', 'product': 'apple', 'price': 1.0},
 {'id': 't2', 'product': 'banana', 'price': 0.5}]

#### If this were SQL

- Assuming table = transactions and columns = id, product, and price:

```sql
SELECT *
FROM transactions
WHERE price <= 1
```

**Problem Overview:**

You are given a list of dictionaries. Each dictionary represents a customer and contains three keys: 'id', 'age', and 'purchases'. 'id' represents the id of a customer, 'age' represents the age of the customer, and 'purchases' is a list of prices for each purchase made by the customer.

Your task is to write a function that calculates the average purchase price for each customer who is above a given age and returns a new list of dictionaries with the calculated averages.

The function should return a new list of dictionaries. Each dictionary should contain the 'id' and 'average_purchase' for each customer.

**Libraries Needed:**

```python
```

**Inputs:**

The function will take the following inputs:

1. A list of dictionaries 'customers'. Each dictionary contains three keys: 'id', 'age', and 'purchases'. 'id' is a string that represents the id of a customer, 'age' is an integer that represents the age of the customer, and 'purchases' is a list of floats that represent the prices of the purchases.
2. An integer 'age_limit'. This represents the minimum age for a customer to be included in the output.

**Expected Outputs:**

The function should return a list of dictionaries. Each dictionary should contain two keys: 'id' and 'average_purchase'. The list should only include customers whose age is greater than 'age_limit'. 'average_purchase' should be the average purchase price rounded to 2 decimal places.

**Data:**

```python
customers = [
    {"id": "c1", "age": 25, "purchases": [10.0, 20.0, 30.0]},
    {"id": "c2", "age": 30, "purchases": [15.0, 25.0, 35.0]},
    {"id": "c3", "age": 35, "purchases": [20.0, 30.0, 40.0]},
]
```

**Encrypted Solution:**

Here's the solution, encrypted with a Caesar cipher with a shift of 3 to the right:

```python
ghilqk_fdofoxodwh_dyhudjhfxvw_rphuvfxvw_rphuv, djh_olplw:
    ghilqk_fdofoxodwh_dyhudjh_sxufkdvhfxvw_rphu:
        dyhudjh_sxufkdvh = urxqg{vxp{fxvw_rphu{'sxufkdvhv'}}/ohq{fxvw_rphu{'sxufkdvhv'}}, 2)
        uhwxuq {'lg': fxvw_rphu{'lg'}, 'dyhudjh_sxufkdvh': dyhudjh_sxufkdvh}
    
    uhvxowv = olvw{pdskh{fdofoxodwh_dyhudjh_sxufkdvh, ilowhu{odpegd{fxvw_rphu: fxvw_rphu{'djh'} juhdwhu wkdq djh_olplw, fxvw_rphuv}})}
    uhwxuq uhvxowv
```

The shift of 3 letters to the right was applied to every letter of the solution, but not to special characters, digits or whitespaces.

In [2]:
import pandas as pd

In [3]:
customers = [
    {"id": "c1", "age": 25, "purchases": [10.0, 20.0, 30.0]},
    {"id": "c2", "age": 30, "purchases": [15.0, 25.0, 35.0]},
    {"id": "c3", "age": 35, "purchases": [20.0, 30.0, 40.0]},
]

In [20]:
def mean_of_list(list_of_float):
    return sum(list_of_float) / len(list_of_float)

In [35]:
def average_price_per_aged_customer(customer_list, minimum_age):
    return [{"id": customer['id'],
             'avg_purchase': mean_of_list(customer['purchases'])
             } for customer in customer_list if customer['age'] >= minimum_age]

print(average_price_per_aged_customer(customers, 0))
print(average_price_per_aged_customer(customers, 30))
print(average_price_per_aged_customer(customers, 35))
print(average_price_per_aged_customer(customers, 40))

[{'id': 'c1', 'avg_purchase': 20.0}, {'id': 'c2', 'avg_purchase': 25.0}, {'id': 'c3', 'avg_purchase': 30.0}]
[{'id': 'c2', 'avg_purchase': 25.0}, {'id': 'c3', 'avg_purchase': 30.0}]
[{'id': 'c3', 'avg_purchase': 30.0}]
[]


**If this were SQL**

```sql
SELECT id, AVG(purchases)
FROM customers
WHERE MAX(age) >= 25
GROUP BY id;
```

Problem Overview:

You are given a dataset of products with their names, categories, and prices. However, some of the product names contain extra spaces and some prices are missing (represented as None). Your task is to write a function that will clean the dataset by:

Removing leading and trailing whitespaces from product names.
Replacing any missing prices (None) with the average price of products in the same category.
The function should return a cleaned list of dictionaries with the updated product names and prices.

Inputs:

The function will take the following input:

A list of dictionaries 'products'. Each dictionary contains three keys: 'name', 'category', and 'price'. 'name' is a string that represents the name of the product, 'category' is a string that represents the category of the product, and 'price' is a float that represents the price of the product (or None if the price is missing).
Expected Outputs:

The function should return a list of dictionaries. Each dictionary should contain three keys: 'name', 'category', and 'price'. The 'name' should have leading and trailing whitespaces removed, and any 'price' that was None should be replaced with the average price of other products in the same category, rounded to 2 decimal places.

Data:

In [36]:
products = [
    {"name": " apple ", "category": "fruit", "price": 1.0},
    {"name": "banana", "category": "fruit", "price": None},
    {"name": "cherry  ", "category": "fruit", "price": 2.0},
    {"name": " lettuce", "category": "vegetable", "price": 1.5},
    {"name": "carrot", "category": "vegetable", "price": None},
]


In [37]:
def clean_products(products):
    categories_average = {}
    for product in products:
        category_name = product['category']
        price = product['price']
        key = category_name
        key_average = categories_average.get(key, [0, 0]) 
        key_average[0] += price if price is not None else 0
        key_average[1] += 1
        categories_average[key] = key_average
    
    for key in categories_average:
        total, count = categories_average[key]
        categories_average[key] = total / count if count else 0
    
    for product in products:
        product['name'] = product['name'].strip()
        price = product['price']
        category_name = product['category']
        key = category_name
        key_average = categories_average[key]
        if price is None:
            product['price'] = round(key_average, 2)
    
    return products


In [38]:

clean_products(products)

[{'name': 'apple', 'category': 'fruit', 'price': 1.0},
 {'name': 'banana', 'category': 'fruit', 'price': 1.0},
 {'name': 'cherry', 'category': 'fruit', 'price': 2.0},
 {'name': 'lettuce', 'category': 'vegetable', 'price': 1.5},
 {'name': 'carrot', 'category': 'vegetable', 'price': 0.75}]

Objective:
Background:

You are a Data Scientist at a retail company and you have been given a dataset containing transaction data. This data includes the transaction date, product ID, quantity, and price. Unfortunately, some of the date values are missing and are marked as None. You need to impute these missing dates using the median date of the entire dataset.

Question:

Write a function that takes the transaction data and imputes the missing dates with the median date. Make sure to convert the dates from string to Python datetime objects and then back to string in the 'YYYY-MM-DD' format after imputation.

Libraries Needed:

python
Copy code
from datetime import datetime
Inputs:

A list of dictionaries named transactions. Each dictionary contains:
'date': a string representing the date of the transaction in 'YYYY-MM-DD' format or None if the date is missing.
'product_id': a string representing the product ID.
'quantity': an integer representing the quantity of the product.
'price': a float representing the price of the product.
Expected Outputs:

The function should return a list of dictionaries with the same structure as the input, but with all missing dates imputed with the median date of the dataset.
Data:
python
Copy code
transactions = [
    {"date": "2022-07-01", "product_id": "P1", "quantity": 10, "price": 5.5},
    {"date": None, "product_id": "P2", "quantity": 8, "price": 6.0},
    {"date": "2022-07-03", "product_id": "P3", "quantity": 15, "price": 7.0},
    {"date": "2022-07-02", "product_id": "P4", "quantity": 12, "price": 6.5},
    {"date": None, "product_id": "P5", "quantity": 9, "price": 5.0},
]
Solution:

In [5]:
import pandas as pd
import numpy as np

# Read in the data
transactions = [
    {"date": "2022-07-01", "product_id": "P1", "quantity": 10, "price": 5.5},
    {"date": None, "product_id": "P2", "quantity": 8, "price": 6.0},
    {"date": "2022-07-03", "product_id": "P3", "quantity": 15, "price": 7.0},
    {"date": "2022-07-02", "product_id": "P4", "quantity": 12, "price": 6.5},
    {"date": None, "product_id": "P5", "quantity": 9, "price": 5.0},
]
df = pd.DataFrame(transactions)

In [7]:
def imput_missing_dates(df):
    df['date'] = pd.to_datetime(df['date'])
    median_date = df['date'].median()
    df['date'] = df['date'].fillna(median_date)
    return df

imput_missing_dates(df)

Unnamed: 0,date,product_id,quantity,price
0,2022-07-01,P1,10,5.5
1,2022-07-02,P2,8,6.0
2,2022-07-03,P3,15,7.0
3,2022-07-02,P4,12,6.5
4,2022-07-02,P5,9,5.0


### Problem Overview:

You are working as a Data Scientist for a retail company. The company stores data about customer purchases in a DataFrame. Each row in the DataFrame represents a unique purchase and contains the customer's ID, the date of the purchase, the purchased item's ID, and the item's price.

However, the data is not clean. The item prices are stored as strings with a dollar sign, and some of the purchase dates are missing. Your task is to clean the DataFrame by converting the item prices to floats and imputing the missing dates with the median date.

### Libraries Needed:

python

import pandas as pd
import numpy as np
### Inputs:

A pandas DataFrame named df. The DataFrame has the following columns:

'customer_id': a string representing the customer's ID.
'date': a string representing the date of the purchase in the 'YYYY-MM-DD' format or None if the date is missing.
'item_id': a string representing the purchased item's ID.
'price': a string representing the price of the item with a dollar sign.
### Expected Outputs:

The function should return a cleaned DataFrame with the same structure as the input, but with the item prices converted to floats and all missing dates imputed with the median date.

### Data:

Let's create a large DataFrame with 1,000 rows. For simplicity, we can randomly generate the data:

In [17]:
import pandas as pd
import numpy as np

np.random.seed(0)

# Generate 1,000 rows of data
n_rows = 1000
customer_ids = [f"C{i}" for i in np.random.randint(1, 100, n_rows)]
dates = pd.date_range(start='2022-01-01', end='2022-12-31').to_list()
dates = np.random.choice(dates + [None] * len(dates), n_rows).tolist()
item_ids = [f"I{i}" for i in np.random.randint(1, 50, n_rows)]
prices = [f"${i:.2f}" for i in np.random.uniform(1, 100, n_rows)]

# Create DataFrame
df = pd.DataFrame({
    'customer_id': customer_ids,
    'date': dates,
    'item_id': item_ids,
    'price': prices,
})


In [18]:
df

Unnamed: 0,customer_id,date,item_id,price
0,C45,NaT,I29,$64.74
1,C48,2022-09-19,I15,$50.68
2,C65,2022-02-08,I15,$81.34
3,C68,NaT,I17,$48.13
4,C68,2022-03-30,I36,$52.79
...,...,...,...,...
995,C6,2022-12-04,I44,$88.09
996,C39,2022-10-27,I26,$29.21
997,C39,2022-11-20,I21,$94.23
998,C66,NaT,I48,$55.07


In [19]:
def clean_data(dataframe):
    #First get the median date and fill the missing values with it
    median_date = dataframe['date'].median()
    dataframe['date'] = dataframe['date'].fillna(median_date)

    #Convert the price column to float after removing $
    dataframe['price'] = dataframe['price'].str.replace('$','').str.replace(',','').astype(float)
    return dataframe


In [20]:
clean_data(df)

Unnamed: 0,customer_id,date,item_id,price
0,C45,2022-07-01,I29,64.74
1,C48,2022-09-19,I15,50.68
2,C65,2022-02-08,I15,81.34
3,C68,2022-07-01,I17,48.13
4,C68,2022-03-30,I36,52.79
...,...,...,...,...
995,C6,2022-12-04,I44,88.09
996,C39,2022-10-27,I26,29.21
997,C39,2022-11-20,I21,94.23
998,C66,2022-07-01,I48,55.07


### Objective:

### Background:
You are a data scientist at a retail company, and you are tasked with analyzing the sales data of various products over different months. You need to identify the products that show a significant upward trend in sales.

### Question:
Write a Python function identify_trending_products(sales_data: pd.DataFrame) -> List[str] that takes a pandas DataFrame containing sales data and returns a list of product names that show a statistically significant upward trend in sales over months.

The input DataFrame sales_data has the following columns:

'product_name': (str) the name of the product
'month': (int) the month of the sale (from 1 to 12)
'sales': (float) the sales value for the product in that month
Return a list of product names that show a significant upward trend in sales over months. A product is considered to have a significant upward trend if the p-value of its linear regression slope is less than 0.05.

### Inputs:
sales_data: a pandas DataFrame with columns 'product_name' (str), 'month' (int), and 'sales' (float).
Outputs:
A list of strings containing the names of products that show a statistically significant upward trend in sales.
### Libraries Needed:
python
Copy code
import pandas as pd
from scipy.stats import linregress
### Data:

In [2]:
import pandas as pd
data = pd.DataFrame({
  "product_name": ["Widget A", "Widget A", "Widget A", "Widget B", "Widget B", "Widget B", "Widget C", "Widget C", "Widget C"],
  "month": [1, 2, 3, 1, 2, 3, 1, 2, 3],
  "sales": [100.0, 150.0, 200.0, 50.0, 45.0, 40.0, 5.0, 10.0, 15.0]
})

In [3]:
data

Unnamed: 0,product_name,month,sales
0,Widget A,1,100.0
1,Widget A,2,150.0
2,Widget A,3,200.0
3,Widget B,1,50.0
4,Widget B,2,45.0
5,Widget B,3,40.0
6,Widget C,1,5.0
7,Widget C,2,10.0
8,Widget C,3,15.0


In [23]:
from scipy.stats import linregress
import pandas as pd

def identify_trending_products(sales_data: pd.DataFrame) -> list:
    trending_products = []  # List to store names of trending products
    grouped_sales = sales_data.groupby('product_name')

    for product_name, product_data in grouped_sales:
        result = linregress(product_data['month'], product_data['sales'])
        if (result.pvalue < 0.05) and (result.slope > 0): #Product Growth with P Value
            trending_products.append(product_name)
            
    return trending_products


In [26]:
growing_products = identify_trending_products(data)
growing_products

['Widget A', 'Widget C']

### Problem Overview:

You are working as a Data Scientist for a retail company. The company stores data about customer purchases in a DataFrame. Each row in the DataFrame represents a unique purchase and contains the customer's ID, the date of the purchase, the purchased item's ID, and the item's price.

However, the data is not clean. The item prices are stored as strings with a dollar sign, and some of the purchase dates are missing. Your task is to clean the DataFrame by converting the item prices to floats and imputing the missing dates with the median date.

### Libraries Needed:

```python

import pandas as pd
import numpy as np
```
### Inputs:

A pandas DataFrame named df. The DataFrame has the following columns:

- 'customer_id': a string representing the customer's ID.
- 'date': a string representing the date of the purchase in the 'YYYY-MM-DD' format or None if the date is missing.
- 'item_id': a string representing the purchased item's ID.
- 'price': a string representing the price of the item with a dollar sign.
### Expected Outputs:

The function should return a cleaned DataFrame with the same structure as the input, but with the item prices converted to floats and all missing dates imputed with the median date.

### Data:

Let's create a large DataFrame with 1,000 rows. For simplicity, we can randomly generate the data:

```python

import pandas as pd
import numpy as np

np.random.seed(0)

# Generate 1,000 rows of data
n_rows = 1000
customer_ids = [f"C{i}" for i in np.random.randint(1, 100, n_rows)]
dates = pd.date_range(start='2022-01-01', end='2022-12-31').to_list()
dates = np.random.choice(dates + [None] * len(dates), n_rows).tolist()
item_ids = [f"I{i}" for i in np.random.randint(1, 50, n_rows)]
prices = [f"${i:.2f}" for i in np.random.uniform(1, 100, n_rows)]

# Create DataFrame
df = pd.DataFrame({
    'customer_id': customer_ids,
    'date': dates,
    'item_id': item_ids,
    'price': prices,
})
```
### Encrypted Solution:

Here's the solution, encrypted with a Caesar cipher with a shift of 3 to the right:

```python

ghilqk_fohdq_gdwbdig:
    # Frqyhuw wkh sulpdwb iulfhv wkh iordwv dqg
    # uhpryh wkh groodu vljqv
    dig['sulpd'] = dig['sulpd'].vwulsb{1:}.dvwbsh{ioraw}
    
    # Frqyhqw wkh gdwhv wkh gdwhwbph rEMhfwv
    dig['gdwh'] = sg.gdwhwbph{dig['gdwh']}
    
    # Lpsxwh wkh plvvlqj gdwhv zlwk wkh phglbdq gdwh
    phglbdq_gdwh = dig['gdwh'].phglbdqbrs
    dig['gdwh'].iloobbr{phglbdq_gdwh, lqsoadh=Wrxd}
    
    # Frqyhqw wkh gdwhv edfn wkh vwulqjv
    dig['gdwh'] = dig['gdwh'].gvwbsh{\'BBBB-PP-GG\'}
    
    uhwxuq dig
```
The shift of 3 letters to the right was applied to every letter of the solution, but not to special characters, digits, or whitespaces.

In [1]:
import pandas as pd
import numpy as np

np.random.seed(0)

# Generate 1,000 rows of data
n_rows = 1000
customer_ids = [f"C{i}" for i in np.random.randint(1, 100, n_rows)]
dates = pd.date_range(start='2022-01-01', end='2022-12-31').to_list()
dates = np.random.choice(dates + [None] * len(dates), n_rows).tolist()
item_ids = [f"I{i}" for i in np.random.randint(1, 50, n_rows)]
prices = [f"${i:.2f}" for i in np.random.uniform(1, 100, n_rows)]

# Create DataFrame
df = pd.DataFrame({
    'customer_id': customer_ids,
    'date': dates,
    'item_id': item_ids,
    'price': prices,
})

In [2]:
df

Unnamed: 0,customer_id,date,item_id,price
0,C45,NaT,I29,$64.74
1,C48,2022-09-19,I15,$50.68
2,C65,2022-02-08,I15,$81.34
3,C68,NaT,I17,$48.13
4,C68,2022-03-30,I36,$52.79
...,...,...,...,...
995,C6,2022-12-04,I44,$88.09
996,C39,2022-10-27,I26,$29.21
997,C39,2022-11-20,I21,$94.23
998,C66,NaT,I48,$55.07


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   customer_id  1000 non-null   object        
 1   date         518 non-null    datetime64[ns]
 2   item_id      1000 non-null   object        
 3   price        1000 non-null   object        
dtypes: datetime64[ns](1), object(3)
memory usage: 31.4+ KB


In [8]:
def clean_the_data(df):
    median_date = df['date'].median()
    df['date'] = df['date'].fillna(median_date)

    #Remove the '$' symbol from each price
    df['price'] = df['price'].str.replace('$', '').astype(float)
    return df

In [9]:
clean_data = clean_the_data(df)

In [11]:
clean_data.price.describe()

count    1000.00000
mean       51.00319
std        28.38245
min         1.01000
25%        27.87250
50%        50.56000
75%        75.77000
max       100.00000
Name: price, dtype: float64

### Objective:
You are tasked with analyzing customer feedback data to identify trends and sentiments. The dataset contains customer reviews in text format, along with corresponding ratings on a scale of 1 to 5. Your goal is to preprocess the text data, perform sentiment analysis, and generate a summary report.

### Libraries Needed:
You will need the following libraries:

In [9]:
import pandas as pd
import numpy as np
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer


### Data:
You are provided with a JSON file named "customer_feedback.json" that contains the following columns: "review_text" (textual customer reviews) and "rating" (integer ratings from 1 to 5).

### Inputs:
Columns: "review_text" (str), "rating" (int)

### Expected Outputs:
A summary report in the form of a DataFrame with the following columns: "Rating", "Positive Reviews", "Neutral Reviews", "Negative Reviews".
For each rating (1 to 5), count the number of reviews classified as positive, neutral, and negative based on sentiment analysis.

In [10]:
json_data = [
    {
        "review_text": "Great product, very satisfied with my purchase.",
        "rating": 5
    },
    {
        "review_text": "The quality is not up to my expectations.",
        "rating": 2
    },
  {
    "review_text": "Great product, very satisfied with my purchase.",
    "rating": 5
  },
  {
    "review_text": "The quality is not up to my expectations.",
    "rating": 2
  },
  {
    "review_text": "Fast delivery and excellent customer service.",
    "rating": 4
  },
  {
    "review_text": "I would not recommend this product to others.",
    "rating": 1
  },
  {
    "review_text": "Average performance, nothing exceptional.",
    "rating": 3
  },
  {
    "review_text": "Outstanding experience with the product and seller.",
    "rating": 5
  },
  {
    "review_text": "Terrible quality, fell apart after a few uses.",
    "rating": 1
  },
  {
    "review_text": "Good value for the price.",
    "rating": 4
  },
  {
    "review_text": "Not bad, but could be better.",
    "rating": 3
  },
  {
    "review_text": "Best purchase I've made in a while!",
    "rating": 5
  },
  {
    "review_text": "Extremely disappointed, regret buying it.",
    "rating": 1
  },
  {
    "review_text": "Satisfied with the overall performance.",
    "rating": 4
  },
  {
    "review_text": "Could use some improvements, but it's decent.",
    "rating": 3
  },
  {
    "review_text": "Absolutely fantastic, exceeded my expectations!",
    "rating": 5
  },
  {
    "review_text": "Avoid at all costs, waste of money.",
    "rating": 1
  },
  {
    "review_text": "Reasonably good quality for the price.",
    "rating": 4
  },
  {
    "review_text": "Meh, not impressed.",
    "rating": 2
  },
  {
    "review_text": "Highly recommended, top-notch product!",
    "rating": 5
  },
  {
    "review_text": "The worst thing I've ever purchased.",
    "rating": 1
  },
  {
    "review_text": "Decent experience, nothing extraordinary.",
    "rating": 3
  },
  {
    "review_text": "Impressed with the features and performance.",
    "rating": 4
  },
  {
    "review_text": "Absolutely awful, total waste of money.",
    "rating": 1
  },
  {
    "review_text": "Solid product, reliable and functional.",
    "rating": 4
  },
  {
    "review_text": "Could have been better, not too happy.",
    "rating": 2
  },
  {
    "review_text": "Exceeded my expectations, very satisfied!",
    "rating": 5
  },
  {
    "review_text": "Disappointed with the quality, broke easily.",
    "rating": 2
  },
  {
    "review_text": "Good purchase, serves its purpose well.",
    "rating": 4
  },
  {
    "review_text": "Waste of money, won't buy again.",
    "rating": 1
  },
  {
    "review_text": "Average quality, nothing special.",
    "rating": 3
  },
  {
    "review_text": "Highly impressed, great value for the price.",
    "rating": 5
  },
  {
    "review_text": "Awful product, regret buying it.",
    "rating": 1
  },
  {
    "review_text": "Satisfactory performance, met my needs.",
    "rating": 4
  },
  {
    "review_text": "Not very good, needs improvement.",
    "rating": 2
  },
  {
    "review_text": "Exceptional product, exceeded expectations.",
    "rating": 5
  },
  {
    "review_text": "Complete waste of money, do not recommend.",
    "rating": 1
  },
  {
    "review_text": "Decent value for the price paid.",
    "rating": 3
  },
  {
    "review_text": "Not impressed, would not buy again.",
    "rating": 2
  },
  {
    "review_text": "Absolutely amazing, worth every penny!",
    "rating": 5
  },
  {
    "review_text": "Horrible quality, fell apart quickly.",
    "rating": 1
  },
  {
    "review_text": "Good performance, satisfied overall.",
    "rating": 4
  },
  {
    "review_text": "Could be better, but it's acceptable.",
    "rating": 3
  },
  {
    "review_text": "Incredible product, highly recommended!",
    "rating": 5
  },
  {
    "review_text": "Avoid this product, not worth it.",
    "rating": 1
  },
  {
    "review_text": "Reasonable quality, met expectations.",
    "rating": 3
  },
  {
    "review_text": "Below average, disappointed.",
    "rating": 2
  },
  {
    "review_text": "Top-notch quality, very satisfied!",
    "rating": 5
  },
  {
    "review_text": "Worst purchase ever, regretting it.",
    "rating": 1
  },
  {
    "review_text": "Average performance, nothing exceptional.",
    "rating": 3
  },
  {
    "review_text": "Good value for the money spent.",
    "rating": 4
  },
  {
    "review_text": "Not up to par, needs improvement.",
    "rating": 2
  },
  {
    "review_text": "Absolutely outstanding, thrilled!",
    "rating": 5
  },
  {
    "review_text": "Total waste of money, avoid.",
    "rating": 1
  },
  {
    "review_text": "Met my expectations, decent product.",
    "rating": 4
  },
  {
    "review_text": "Could have been better, not satisfied.",
    "rating": 2
  },
  {
    "review_text": "Impressed beyond words, excellent!",
    "rating": 5
  },
  {
    "review_text": "Horrible quality, fell apart quickly.",
    "rating": 1
  },
  {
    "review_text": "Satisfactory performance, met my needs.",
    "rating": 4
  },
  {
    "review_text": "Disappointing purchase, regretting it.",
    "rating": 1
  },
  {
    "review_text": "Average quality, nothing special.",
    "rating": 3
  },
  {
    "review_text": "Top-notch product, highly satisfied!",
    "rating": 5
  }
]

# Create a DataFrame from the JSON data


In [11]:
df = pd.DataFrame(json_data)
df

Unnamed: 0,review_text,rating
0,"Great product, very satisfied with my purchase.",5
1,The quality is not up to my expectations.,2
2,"Great product, very satisfied with my purchase.",5
3,The quality is not up to my expectations.,2
4,Fast delivery and excellent customer service.,4
...,...,...
57,"Horrible quality, fell apart quickly.",1
58,"Satisfactory performance, met my needs.",4
59,"Disappointing purchase, regretting it.",1
60,"Average quality, nothing special.",3


In [12]:
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/danmarino/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [13]:
# Define a function to categorize sentiment
def get_sentiment_label(score):
    if score > 0.05:
        return "Positive"
    elif score < -0.05:
        return "Negative"
    else:
        return "Neutral"

df['sentiment'] = df['review_text'].apply(lambda x: get_sentiment_label(sia.polarity_scores(x)['compound']))

In [16]:
summary_report = pd.pivot_table(df, index='rating', columns='sentiment', values='review_text', aggfunc='count', fill_value=0)
summary_report.columns.name = None
summary_report.reset_index(inplace=True)

# Rename columns for clarity
summary_report.rename(columns={'Positive': 'Positive Reviews', 'Neutral': 'Neutral Reviews', 'Negative': 'Negative Reviews'}, inplace=True)

# Display the summary report
summary_report

Unnamed: 0,rating,Negative Reviews,Neutral Reviews,Positive Reviews
0,1,15,0,0
1,2,4,3,3
2,3,2,4,4
3,4,0,1,11
4,5,0,1,14


### The sentiment on the reviews aligns well with the ratings