# Version 2 - Goal Explorer General Improvements

## Ver 2.0 - Generate goal sub categories with one prompt
- prompt modification in `goal.py`  

``` python
user_prompt = f"""The number of GOALS to generate is {n}. 
If there are columns with data type category or string, generate goals for them. Generate a number of goals equal to ceil(n / 3).
If there are columns with data type number, generate goals for them. Generate a number of goals equal to ceil(n / 3).
If there are columns with data type date, generate goals for them. Generate a number of goals equal to ceil(n / 3).
If any of the above data types are not present in the dataset, distribute the remaining goals evenly among the available data types.
Ensure the total number of goals generated is exactly {n}.
The goals should be based on the data summary below, \n\n .
{summary} \n\n"""
```

## Ver 2.1 -  Generate goal sub categories with one prompt (more specific)
- prompt modification in `goal.py`

``` python
user_prompt = f"""Generate a TOTAL of {n} goals based on the following criteria:

1. If there are columns with data type category or string, generate a number of goals equal to ceil(n / 4) that focus EXCLUSIVELY on these columns.
2. If there are columns with data type number, generate a number of goals equal to ceil(n / 4) that focus EXCLUSIVELY on these columns.
3. If there are columns with data type date, generate a number of goals equal to ceil(n / 4) that focus EXCLUSIVELY on these columns.

For the remaining goals, denoted as m:
- Generate ceil(m / 2) goals that explore relationships between two variables.
- Generate ceil(m / 2) goals that explore relationships among three variables.

If any of the above data types are not present in the dataset, distribute the remaining goals evenly by creating mixed goals.

Ensure the total number of goals generated is exactly {n}.

The goals should be based on the data summary below, \n\n .
{summary} \n\n"""
```
## Ver 2.2 - Generate goal sub categories with one prompt (more explicit)

``` python
user_prompt = f"""Generate a TOTAL of {n} goals based on the following criteria:

1. **Single Data Type Focus:** 
- For categorical or string columns, generate ceil(n / 4) goals focusing EXCLUSIVELY on these columns.
- For numerical columns, generate ceil(n / 4) goals focusing EXCLUSIVELY on these columns.
- For date columns, generate ceil(n / 4) goals focusing EXCLUSIVELY on these columns.

2. **Exploring Relationships:**
- For the remaining goals (denoted as m):
    - Generate ceil(m / 2) goals that explore relationships between EXACTLY TWO variables.
    - Generate ceil(m / 2) goals that analyze relationships involving AT LEAST three variables (e.g. with grouped bar charts).

3. **Ensure Total Count:**
- Ensure the total number of goals generated is exactly {n}.

The goals should be based on the data summary below, \n\n .
{summary} \n\n"""
```

## Ver 2.3 -  Generate goal sub categories with one prompt (Shotened)

``` python
user_prompt = f"""Generate a TOTAL of {n} goals based on the following criteria:

1. For categorical or string columns, generate goals focusing EXCLUSIVELY on these columns.
2. For numerical columns, generate goals focusing EXCLUSIVELY on these columns.
3. For date columns, generate goals focusing EXCLUSIVELY on these columns.

Make the goals generated evenly distrubuted across the three and the total number of goals should be {n}
The goals should be based on the data summary below, \n\n .
{summary} \n\n"""
```

## Ver 2.4 - Focus on just generating goals with 3 variables
- prompt modification in `goal.py`
- observation: was able to generate new plots and more insightful connections between 3 variables.

``` python
user_prompt = f"""Generate a TOTAL of {n} goals. All the goals must explore the relationships of at  least 3 variables. 

The goals should be based on the data summary below, \n\n .
{summary} \n\n"""
```

## Ver 2.5 - Adding helper functions to generate goals for subcategories
### Ver 2.5.1 - Add generate categorical

- prompt in the `generate_categorical` function

``` python
user_prompt = f"""Generate a TOTAL of AT LEAST {n} goals. All the goals generated must focus on a column with a 'categorical' or 'string' data type. 
If there are no columns with the data type 'category' or 'string', return an empty string.

The goals should be based on the data summary below, \n\n .
{summary} \n\n"""

```
   - generates some n goals with categorical values
   - returns an empty string if there's no categorical column

### Ver 2.5.1 - Add generate date
- prompt in the `generate_date` function

``` python
user_prompt = f"""Generate A MAXIMUM of {n} goals. All the goals generated must focus on a column with a 'date' data type. 
If there are no columns with the data type 'date', return an empty string.

The goals should be based on the data summary below, \n\n .
{summary} \n\n"""
```
   - generates some n goals with date values
   - returns an empty string if there's no date column

### Ver 2.5.2 - Add categorical, date and mix 3 and mix 2 in one function
- added a general helper function `generate_goals` which takes in a `focus` to know what type of goals it should generate
- bug: indexing is wrong when they're all joined together
- bug: too wasteful, prompt is the one that checks if there's a categorical value in the summary
- bug: allocation of number of goals per type is not correct.

### Ver 2.5.3 - Added calculate_distribution function
- added `calculate_distribution` function to automatically calculate the number of goals per category.
  - generate the most for mix 2, then split among category, date and mix 3 when applicable
- bug: indexing is wrong

### Ver 2.5.4 - Fixed indexing bug
- No bugs atm :D

### Ver 2.5.5 - Generate goals for numbers too and add it to the combined
- Generated goals that feature the data type `numbers` too.
- bug: forgot to update the `generate` function to add the `number_goals`

### Ver 2.5.6 - Fixed bug in 2.5.5 LOL

### Ver 2.5.7 - Restored version hehe
- Broke the old version so I had to restore it. This is just the restored version. Might have some slight changes from the previous one but not significant.

### Ver 2.5.7 - Modified rationale prompt
- modified `SYSTEM_INSTRUCTIONS` in `goal`

``` python
SYSTEM_INSTRUCTIONS = """
You are a an experienced data analyst who can generate a given number of insightful GOALS about data, when given a summary of the data, and a specified persona. The VISUALIZATIONS YOU RECOMMEND MUST FOLLOW VISUALIZATION BEST PRACTICES (e.g., must use bar charts instead of pie charts for comparing quantities) AND BE MEANINGFUL (e.g., plot longitude and latitude on maps where appropriate). They must also be relevant to the specified persona. Each goal must include a question, a visualization (THE VISUALIZATION MUST REFERENCE THE EXACT COLUMN FIELDS FROM THE SUMMARY), and a rationale (JUSTIFICATION FOR WHICH dataset FIELDS ARE USED and what we will learn from the visualization and why the visualization was chosen). Each goal MUST mention the exact fields from the dataset summary above
"""
```

### Ver 2.5.8 - Added error catching for unsupported `focus` types
- Made `focus` types more specific and added error catching for unsupported types.

### Ver 2.5.9 - Removed added propmpt to reduce duplication
- Rationale: It still generated a duplicate anyways and removing it doesn't cause much changes regardless.

## Setup

In [1]:
%pip uninstall -y lida 
# !pip install lida[infographics] # for infographics support

Note: you may need to restart the kernel to use updated packages.




In [2]:
%pip show lida

Note: you may need to restart the kernel to use updated packages.




In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
import sys
import os
import pprint

In [5]:
from dotenv import load_dotenv

load_dotenv()

True

In [6]:
sys.path.append(os.path.abspath('../..'))

In [7]:
from lida.components.manager import Manager
from llmx import TextGenerationConfig, llm

In [8]:
lida = Manager(text_gen = llm("openai", api_key=os.getenv("APIKEY"))) # !! api key
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-4o-mini", use_cache=True)

In [9]:
display(lida)

<lida.components.manager.Manager at 0x2225a3f3e30>

## Summarize Data, Generate Goals

### Summarize

In [10]:
summary = lida.summarize("../cars.csv", summary_method="default", textgen_config=textgen_config)  
pprint.pprint(summary)

here
object
{'Sedan': 234, 'SUV': 59, 'Sports Car': 45, 'Wagon': 29, 'Minivan': 20}
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
float64
float64
float64
float64
float64
float64
float64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
int64
{'dataset_description': '',
 'field_names': ['Name',
                 'Type',
                 'AWD',
                 'RWD',
                 'Retail_Price',
                 'Dealer_Cost',
                 'Engine_Size__l_',
                 'Cyl',
                 'Horsepower_HP_',
                 'City_Miles_Per_Gallon',
                 'Highway_Mi

In [11]:
for data in summary['fields']:
    print(data['properties']['dtype'])

string
category
number
number
number
number
number
number
number
number
number
number
number
number
number


### Generate Goals without a Persona

In [12]:
goals = lida.goals(summary, n=20, textgen_config=textgen_config)

for goal in goals:
    display(goal)


### Goal 0
---
**Question:** What are the average Retail Prices for different car Types?

**Visualization:** `bar chart of Type vs. average Retail_Price`

**Rationale:** This visualization will help us understand how the average retail price varies across different car types (Sedan, SUV, Sports Car, etc.), allowing us to identify which categories are more expensive and which are more affordable. This is crucial for market analysis and pricing strategies.



### Goal 1
---
**Question:** How does the distribution of Engine Size differ across car Types?

**Visualization:** `box plot of Engine_Size__l_ grouped by Type`

**Rationale:** Using a box plot will allow us to visualize the distribution of engine sizes for each car type. This can reveal insights about performance characteristics and market segmentation, indicating whether certain types of cars tend to have larger or smaller engines.



### Goal 2
---
**Question:** What is the relationship between Horsepower and car Type?

**Visualization:** `scatter plot of Horsepower_HP_ vs. Type`

**Rationale:** A scatter plot will help us identify any trends or clusters between horsepower and car types. This analysis can provide insights into performance expectations associated with various types of vehicles, aiding in consumer decision-making.



### Goal 3
---
**Question:** What percentage of each car Type has AWD?

**Visualization:** `bar chart of Type vs. percentage of AWD`

**Rationale:** This bar chart will show the proportion of all car types that are equipped with all-wheel drive (AWD). Understanding the prevalence of AWD in different categories can inform both consumer preferences and manufacturer strategies.



### Goal 4
---
**Question:** How does the City Miles Per Gallon vary by car Type?

**Visualization:** `box plot of City_Miles_Per_Gallon grouped by Type`

**Rationale:** A box plot will allow us to see the distribution of city miles per gallon across different car types, highlighting which categories are more fuel-efficient. This is important for consumers concerned about fuel costs and environmental impact.



### Goal 5
---
**Question:** What is the relationship between Retail Price and Dealer Cost?

**Visualization:** `scatter plot of Retail_Price vs Dealer_Cost`

**Rationale:** This visualization will help us understand how closely the retail price correlates with the dealer cost, indicating pricing strategies and potential profit margins. By plotting 'Retail_Price' against 'Dealer_Cost', we can identify trends, outliers, and overall pricing behavior in the dataset.



### Goal 6
---
**Question:** How does Engine Size affect Horsepower?

**Visualization:** `scatter plot of Engine_Size__l_ vs Horsepower_HP_`

**Rationale:** This scatter plot will show the relationship between engine size and horsepower, allowing us to analyze whether larger engines generally produce more power. By examining 'Engine_Size__l_' against 'Horsepower_HP_', we can gain insights into engineering trends and performance expectations.



### Goal 7
---
**Question:** What is the distribution of City Miles Per Gallon across different types of cars?

**Visualization:** `box plot of City_Miles_Per_Gallon grouped by Type`

**Rationale:** A box plot will provide a clear view of the distribution of city mileage for various car types, highlighting differences in fuel efficiency. By using 'City_Miles_Per_Gallon' and grouping by 'Type', we can identify which types of vehicles are more efficient and where outliers exist.



### Goal 8
---
**Question:** How do Weight and Length correlate with Retail Price?

**Visualization:** `multiple regression analysis plot of Weight and Len against Retail_Price`

**Rationale:** This analysis will help us understand how the weight and length of a vehicle impact its retail price. By analyzing 'Weight' and 'Len' in relation to 'Retail_Price', we can uncover patterns that may indicate consumer preferences or market trends.



### Goal 9
---
**Question:** What is the average Highway Miles Per Gallon for cars with different numbers of Cylinders?

**Visualization:** `bar chart of average Highway_Miles_Per_Gallon grouped by Cyl`

**Rationale:** A bar chart will effectively display the average highway mileage for cars based on the number of cylinders. By aggregating 'Highway_Miles_Per_Gallon' by 'Cyl', we can assess how engine configuration affects fuel efficiency on the highway.



### Goal 10
---
**Question:** How does the Retail Price vary with Engine Size and Type of vehicle?

**Visualization:** `scatter plot of Retail_Price vs Engine_Size__l_ colored by Type`

**Rationale:** This visualization will help us understand the relationship between engine size and retail price while categorizing the data by vehicle type. By using a scatter plot, we can identify trends or clusters that indicate how different types of vehicles (e.g., SUVs, Sedans) are priced relative to their engine size.



### Goal 11
---
**Question:** What is the relationship between Horsepower, Weight, and Type of vehicle?

**Visualization:** `3D scatter plot of Horsepower_HP_ vs Weight colored by Type`

**Rationale:** This 3D scatter plot will allow us to visualize the interaction between horsepower and weight across different vehicle types. It will help in understanding how these two variables correlate and whether certain types of vehicles (like Sports Cars) tend to have higher horsepower relative to their weight.



### Goal 12
---
**Question:** How do City and Highway Miles Per Gallon relate to Retail Price across different Types?

**Visualization:** `grouped bar chart of average City_Miles_Per_Gallon and Highway_Miles_Per_Gallon by Type`

**Rationale:** This grouped bar chart will provide insights into the fuel efficiency of different vehicle types in relation to their retail price. By comparing average city and highway MPG, we can evaluate how efficiency varies and whether it has an impact on pricing strategies for different types of vehicles.



### Goal 13
---
**Question:** What is the impact of the number of Cylinders on Retail Price and Weight?

**Visualization:** `box plot of Retail_Price and Weight by Cyl`

**Rationale:** Using a box plot allows us to see the distribution of retail price and weight across different cylinder counts. This visualization will help identify trends in pricing and weight as the number of cylinders increases, revealing insights into how engine design impacts vehicle characteristics.



### Goal 14
---
**Question:** How does the combination of AWD and RWD affect the Horsepower and Retail Price?

**Visualization:** `heatmap of average Horsepower_HP_ and Retail_Price by AWD and RWD`

**Rationale:** This heatmap will illustrate the interaction between all-wheel drive (AWD), rear-wheel drive (RWD), horsepower, and retail price. By analyzing this data, we can uncover patterns that indicate how drivetrain configurations influence performance and market pricing.



### Goal 15
---
**Question:** How does the Retail Price relate to Engine Size?

**Visualization:** `scatter plot of Retail_Price vs Engine_Size__l_`

**Rationale:** This visualization will help us understand if there is a correlation between the size of the engine (Engine_Size__l_) and the retail price (Retail_Price) of the cars. A scatter plot is appropriate here as it allows for the observation of relationships between two continuous variables, revealing any trends or patterns.



### Goal 16
---
**Question:** What is the relationship between Horsepower and Weight?

**Visualization:** `scatter plot of Horsepower_HP_ vs Weight`

**Rationale:** This scatter plot will illustrate how the horsepower (Horsepower_HP_) of a vehicle correlates with its weight (Weight). This is crucial for understanding performance dynamics; heavier cars may require more horsepower for better acceleration. The scatter plot allows for easy identification of correlations and outliers.



### Goal 17
---
**Question:** How does City Miles Per Gallon compare to Highway Miles Per Gallon?

**Visualization:** `scatter plot of City_Miles_Per_Gallon vs Highway_Miles_Per_Gallon`

**Rationale:** This visualization will provide insights into how city fuel efficiency (City_Miles_Per_Gallon) relates to highway fuel efficiency (Highway_Miles_Per_Gallon). A scatter plot is suitable for this analysis as it can reveal any linear relationships, indicating whether vehicles that perform well in the city also perform well on the highway.



### Goal 18
---
**Question:** Is there a relationship between the number of Cylinders and Horsepower?

**Visualization:** `box plot of Horsepower_HP_ grouped by Cyl`

**Rationale:** Using a box plot to visualize Horsepower_HP_ against the number of Cylinders (Cyl) will allow us to compare the distribution of horsepower across different cylinder counts. This is useful for understanding how engine configuration affects performance, and the box plot effectively summarizes the data's central tendency and variability.



### Goal 19
---
**Question:** What is the impact of car Type on Retail Price?

**Visualization:** `box plot of Retail_Price grouped by Type`

**Rationale:** This box plot will show how the type of car (Type) influences its retail price (Retail_Price). By categorizing the data into different types of vehicles, we can assess the price range and median for each type, providing valuable insights for pricing strategies and market positioning.


### Saving and Loading Goals without a Persona

In [13]:
import pickle

In [14]:
# SAVE

with open('goals2-5-9-20-combined-cars.pkl', 'wb') as f:
    pickle.dump(goals, f)

In [19]:
# LOAD

with open('goals2-5-6-20-combined-cars.pkl', 'rb') as f:
    loaded = pickle.load(f)

display(loaded)

[Goal(question='What is the distribution of car types in the dataset?', visualization='bar chart of Type', rationale="This visualization uses the 'Type' column to show the frequency of each car type (e.g., Sedan, SUV, Sports Car, Wagon, Minivan). A bar chart is appropriate here as it allows for easy comparison of the number of cars in each category, helping to identify which types are most prevalent and potentially influencing market trends.", index=0),
 Goal(question='How do retail prices vary by car type?', visualization='box plot of Retail_Price by Type', rationale="Using the 'Retail_Price' column grouped by 'Type' allows us to visualize the distribution of prices for each car type. A box plot is effective here as it shows the median, quartiles, and potential outliers, giving insights into price ranges and how they differ among various car types.", index=1),
 Goal(question='What is the average engine size for each car type?', visualization='bar chart of average Engine_Size__l_ by Ty

In [20]:
for goal in loaded:
    display(goal)


### Goal 0
---
**Question:** What is the distribution of car types in the dataset?

**Visualization:** `bar chart of Type`

**Rationale:** This visualization uses the 'Type' column to show the frequency of each car type (e.g., Sedan, SUV, Sports Car, Wagon, Minivan). A bar chart is appropriate here as it allows for easy comparison of the number of cars in each category, helping to identify which types are most prevalent and potentially influencing market trends.



### Goal 1
---
**Question:** How do retail prices vary by car type?

**Visualization:** `box plot of Retail_Price by Type`

**Rationale:** Using the 'Retail_Price' column grouped by 'Type' allows us to visualize the distribution of prices for each car type. A box plot is effective here as it shows the median, quartiles, and potential outliers, giving insights into price ranges and how they differ among various car types.



### Goal 2
---
**Question:** What is the average engine size for each car type?

**Visualization:** `bar chart of average Engine_Size__l_ by Type`

**Rationale:** This visualization will use the 'Engine_Size__l_' column averaged by 'Type' to illustrate the differences in engine size across car types. A bar chart is suitable for this purpose, as it allows for straightforward comparison of average engine sizes, which can be indicative of performance characteristics associated with each type.



### Goal 3
---
**Question:** What is the relationship between car type and fuel efficiency (City and Highway MPG)?

**Visualization:** `grouped bar chart of average City_Miles_Per_Gallon and Highway_Miles_Per_Gallon by Type`

**Rationale:** This visualization will utilize 'City_Miles_Per_Gallon' and 'Highway_Miles_Per_Gallon' to compare average fuel efficiency across different car types. A grouped bar chart allows for a clear comparison between city and highway MPG for each type, aiding in understanding how vehicle design impacts fuel efficiency.



### Goal 4
---
**Question:** What is the impact of car type on weight?

**Visualization:** `box plot of Weight by Type`

**Rationale:** Using the 'Weight' column categorized by 'Type' helps visualize how the weight of cars differs among types. A box plot is appropriate as it provides a detailed view of the weight distribution, including median and outliers, which can inform decisions related to safety, performance, and efficiency based on vehicle weight.



### Goal 5
---
**Question:** How does the horsepower relate to the retail price of cars?

**Visualization:** `scatter plot of Horsepower_HP_ vs Retail_Price`

**Rationale:** This visualization will help us understand the correlation between horsepower and retail price, allowing us to identify if more powerful cars tend to have higher prices.



### Goal 6
---
**Question:** What is the relationship between engine size and fuel efficiency in terms of City and Highway MPG?

**Visualization:** `line chart comparing Engine_Size__l_ with City_Miles_Per_Gallon and Highway_Miles_Per_Gallon`

**Rationale:** Using a line chart will allow us to visualize the trends in fuel efficiency as engine size increases, helping to identify if larger engines correlate with lower MPG.



### Goal 7
---
**Question:** How does weight affect the horsepower of different car types?

**Visualization:** `box plot of Weight vs Horsepower_HP_ grouped by Type`

**Rationale:** A box plot will enable us to compare the distribution of horsepower across different weight categories for each car type, highlighting potential trends and outliers.



### Goal 8
---
**Question:** What is the impact of the number of cylinders on the retail price and dealer cost?

**Visualization:** `bar chart comparing average Retail_Price and Dealer_Cost by Cyl`

**Rationale:** A bar chart will effectively show the average retail price and dealer cost across different cylinder counts, helping to identify if more cylinders lead to higher costs.



### Goal 9
---
**Question:** How does the length of the car relate to its weight and wheelbase?

**Visualization:** `3D scatter plot of Len, Weight, and Wheel_Base`

**Rationale:** A 3D scatter plot will allow us to visualize the relationships between car length, weight, and wheelbase, providing insights into how these dimensions interact with each other.



### Goal 10
---
**Question:** How does the retail price vary with engine size and horsepower across different car types?

**Visualization:** `3D scatter plot of Retail_Price vs Engine_Size__l_ vs Horsepower_HP_ colored by Type`

**Rationale:** This visualization allows us to explore the relationship between retail price, engine size, and horsepower, while also differentiating between car types. It will help identify if larger engines and higher horsepower correlate with higher retail prices across various types of cars.



### Goal 11
---
**Question:** What is the relationship between weight, engine size, and retail price for different car types?

**Visualization:** `Bubble chart of Retail_Price on the y-axis, Weight on the x-axis, and Engine_Size__l_ represented by bubble size, colored by Type`

**Rationale:** This bubble chart will illustrate how weight and engine size affect retail price, while also showing the distribution of different car types. It provides insights into whether heavier cars with larger engines tend to be priced higher, and how this varies by car type.



### Goal 12
---
**Question:** How do city miles per gallon compare with highway miles per gallon based on engine size and number of cylinders?

**Visualization:** `Grouped bar chart of City_Miles_Per_Gallon and Highway_Miles_Per_Gallon grouped by Engine_Size__l_ and colored by Cyl`

**Rationale:** This grouped bar chart allows for a direct comparison of city and highway miles per gallon across different engine sizes and cylinder counts. It will help us understand how engine specifications influence fuel efficiency in urban versus highway driving conditions.



### Goal 13
---
**Question:** What is the impact of wheelbase on the weight and retail price across different car types?

**Visualization:** `Line chart of Weight and Retail_Price against Wheel_Base, with separate lines for each Type`

**Rationale:** This line chart will help visualize the trends in weight and retail price as wheelbase increases, segmented by car type. It will provide insights into how wheelbase affects both the mass and market value of different vehicle categories.



### Goal 14
---
**Question:** How does the distribution of horsepower vary with the number of cylinders and car type?

**Visualization:** `Box plot of Horsepower_HP_ grouped by Cyl and colored by Type`

**Rationale:** This box plot will show the distribution of horsepower across different cylinder counts for each car type. It will allow us to identify patterns and outliers in horsepower based on engine configuration, providing insights into performance characteristics associated with different types of vehicles.



### Goal 15
---
**Question:** How does the retail price relate to the engine size across different car types?

**Visualization:** `scatter plot of Retail_Price vs Engine_Size__l_ colored by Type`

**Rationale:** This visualization will help us understand if there's a correlation between engine size and retail price, while also allowing us to see if this relationship differs by car type. By plotting Retail_Price against Engine_Size__l_ and using Type to differentiate the points, we can gain insights into how engine size impacts pricing across various categories of cars.



### Goal 16
---
**Question:** What is the relationship between horsepower and weight for different car types?

**Visualization:** `scatter plot of Horsepower_HP_ vs Weight colored by Type`

**Rationale:** This visualization will illustrate the relationship between horsepower and weight, which is crucial for understanding performance characteristics. By differentiating the points by Type, we can see if certain types of cars (e.g., Sports Cars vs. SUVs) have different dynamics in terms of horsepower relative to their weight.



### Goal 17
---
**Question:** How do city miles per gallon compare with highway miles per gallon across different engine sizes?

**Visualization:** `box plot comparing City_Miles_Per_Gallon and Highway_Miles_Per_Gallon grouped by Engine_Size__l_`

**Rationale:** This visualization will reveal how fuel efficiency varies between city and highway driving conditions for various engine sizes. By using a box plot, we can effectively summarize the distribution of MPG values for both city and highway, allowing us to identify trends and outliers in fuel efficiency based on engine size.



### Goal 18
---
**Question:** What is the impact of the number of cylinders on retail price and dealer cost?

**Visualization:** `dual-axis bar chart comparing average Retail_Price and average Dealer_Cost by Cyl`

**Rationale:** This dual-axis bar chart will allow us to compare the average retail price and dealer cost across different cylinder counts. By using this format, we can visualize how the number of cylinders influences both retail pricing and dealer costs, which is essential for understanding market dynamics.



### Goal 19
---
**Question:** How does the weight of the car correlate with its length and width?

**Visualization:** `3D scatter plot of Weight vs Len and Width`

**Rationale:** This 3D scatter plot will help us visualize the relationship between car weight, length, and width. Understanding how these dimensions correlate can provide insights into design trends and structural characteristics of different car types, which is important for a comprehensive analysis of automotive data.


### Generating Goals with a Persona

In [14]:
# goals can also be based on a persona 
persona = "a mechanic who wants to buy a car that is cheap but has good gas mileage"
personal_goals = lida.goals(summary, n=20, persona=persona, textgen_config=textgen_config)
for goal in personal_goals:
    display(goal)


### Goal 0
---
**Question:** What is the distribution of Retail Price?

**Visualization:** `histogram of Retail_Price`

**Rationale:** This tells us about the range of car prices available, helping the mechanic identify affordable options.



### Goal 1
---
**Question:** How does City Miles Per Gallon vary with Retail Price?

**Visualization:** `scatter plot of City_Miles_Per_Gallon vs Retail_Price`

**Rationale:** This helps assess whether higher-priced cars offer better fuel efficiency.



### Goal 2
---
**Question:** What is the correlation between Highway Miles Per Gallon and City Miles Per Gallon?

**Visualization:** `scatter plot of Highway_Miles_Per_Gallon vs City_Miles_Per_Gallon`

**Rationale:** Understanding this relationship can help the mechanic predict fuel efficiency based on city driving conditions.



### Goal 3
---
**Question:** Which car types have the best average City Miles Per Gallon?

**Visualization:** `bar chart of average City_Miles_Per_Gallon by Type`

**Rationale:** This allows the mechanic to quickly identify which types of cars are more fuel-efficient.



### Goal 4
---
**Question:** What is the distribution of Engine Sizes across different car types?

**Visualization:** `box plot of Engine_Size__l_ by Type`

**Rationale:** This shows how engine size varies across types, which can impact fuel efficiency.



### Goal 5
---
**Question:** How does the number of Cylinders affect City Miles Per Gallon?

**Visualization:** `box plot of City_Miles_Per_Gallon by Cyl`

**Rationale:** This helps the mechanic understand how engine configuration influences fuel efficiency.



### Goal 6
---
**Question:** What is the relationship between Horsepower and City Miles Per Gallon?

**Visualization:** `scatter plot of Horsepower_HP_ vs City_Miles_Per_Gallon`

**Rationale:** This can indicate if more powerful cars compromise fuel efficiency.



### Goal 7
---
**Question:** How do the weights of cars affect their fuel efficiency?

**Visualization:** `scatter plot of Weight vs City_Miles_Per_Gallon`

**Rationale:** This helps the mechanic see if lighter cars generally provide better fuel economy.



### Goal 8
---
**Question:** What is the average Retail Price for cars with good gas mileage (above 25 MPG)?

**Visualization:** `bar chart of average Retail_Price for City_Miles_Per_Gallon > 25`

**Rationale:** This gives insight into the cost of more fuel-efficient cars.



### Goal 9
---
**Question:** What is the average Dealer Cost for cars with high Highway Miles Per Gallon?

**Visualization:** `bar chart of average Dealer_Cost for Highway_Miles_Per_Gallon > 30`

**Rationale:** This can help the mechanic identify cost-effective options for high-mileage vehicles.



### Goal 10
---
**Question:** How does the length of a car relate to its fuel efficiency?

**Visualization:** `scatter plot of Len vs City_Miles_Per_Gallon`

**Rationale:** Understanding this relationship can help assess if larger cars are less efficient.



### Goal 11
---
**Question:** What is the distribution of AWD and RWD cars in terms of fuel efficiency?

**Visualization:** `box plot of City_Miles_Per_Gallon by AWD and RWD`

**Rationale:** This helps the mechanic understand how drivetrain configurations impact gas mileage.



### Goal 12
---
**Question:** What is the average weight of cars grouped by their Type?

**Visualization:** `bar chart of average Weight by Type`

**Rationale:** This can provide insights into how the type of car affects its overall weight and potentially its fuel efficiency.



### Goal 13
---
**Question:** How does the wheelbase length correlate with fuel efficiency?

**Visualization:** `scatter plot of Wheel_Base vs City_Miles_Per_Gallon`

**Rationale:** This can help the mechanic understand if longer wheelbases lead to worse fuel economy.



### Goal 14
---
**Question:** What is the average Highway Miles Per Gallon for cars with different engine sizes?

**Visualization:** `bar chart of average Highway_Miles_Per_Gallon by Engine_Size__l_`

**Rationale:** This can help identify if larger engines compromise highway fuel efficiency.



### Goal 15
---
**Question:** What is the relationship between the number of cylinders and Highway Miles Per Gallon?

**Visualization:** `box plot of Highway_Miles_Per_Gallon by Cyl`

**Rationale:** This shows how engine configuration impacts highway fuel efficiency.



### Goal 16
---
**Question:** What percentage of cars have good fuel efficiency (above 25 MPG)?

**Visualization:** `pie chart of percentage of cars with City_Miles_Per_Gallon > 25`

**Rationale:** This gives a quick overview of the availability of fuel-efficient options.



### Goal 17
---
**Question:** What is the average Retail Price for cars categorized as SUVs versus Sedans?

**Visualization:** `bar chart of average Retail_Price by Type`

**Rationale:** This helps the mechanic compare the cost of different car types.



### Goal 18
---
**Question:** How does the average horsepower vary among different types of cars?

**Visualization:** `bar chart of average Horsepower_HP_ by Type`

**Rationale:** This may indicate if more powerful cars are generally more expensive or less fuel-efficient.



### Goal 19
---
**Question:** What is the relationship between car width and fuel efficiency?

**Visualization:** `scatter plot of Width vs City_Miles_Per_Gallon`

**Rationale:** This can help determine if wider cars tend to have lower fuel efficiency.


### Saving and Loading Goals with a Persona

In [15]:
# SAVE

with open('goals-persona.pkl', 'wb') as f:
    pickle.dump(personal_goals, f)

In [16]:
# LOAD

with open('goals-persona.pkl', 'rb') as f:
    loaded = pickle.load(f)

display(loaded)

[Goal(question='What is the distribution of Retail Price?', visualization='histogram of Retail_Price', rationale='This tells us about the range of car prices available, helping the mechanic identify affordable options.', index=0),
 Goal(question='How does City Miles Per Gallon vary with Retail Price?', visualization='scatter plot of City_Miles_Per_Gallon vs Retail_Price', rationale='This helps assess whether higher-priced cars offer better fuel efficiency.', index=1),
 Goal(question='What is the correlation between Highway Miles Per Gallon and City Miles Per Gallon?', visualization='scatter plot of Highway_Miles_Per_Gallon vs City_Miles_Per_Gallon', rationale='Understanding this relationship can help the mechanic predict fuel efficiency based on city driving conditions.', index=2),
 Goal(question='Which car types have the best average City Miles Per Gallon?', visualization='bar chart of average City_Miles_Per_Gallon by Type', rationale='This allows the mechanic to quickly identify whic