# 1. Project LIDA

LIDA treats visualizations as code and provides a clean api for generating, executing, editing, explaining, evaluating and repairing visualization code.

- ✅ Data Summarization
- ✅ Goal Generation
- ✅ Visualization Generation
- ✅ Visualization Editing
- ✅ Visualization Explanation
- ✅ Visualization Evaluation and Repair
- ✅ Visualization Recommendation
- ✅ Infographic Generation (beta) # pip install lida[infographics]

In [27]:
# 1. Getting Started

from lida import Manager, llm

csvfile = "./../../data/kaggle/IPL-2022.csv"
lida = Manager(text_gen = llm("openai")) # palm, cohere ..
summary = lida.summarize(csvfile)
goals = lida.goals(summary, n=10) # exploratory data analysis

# create a list of dictionaries containing the goal information
import pandas as pd
goal_list = []
for goal in goals:
    goal_dict = {'Question': goal.question, 'Visualization': goal.visualization}
    goal_list.append(goal_dict)
df = pd.DataFrame(goal_list)
display(df)



Unnamed: 0,Question,Visualization,Rationale
0,What is the distribution of first innings scores?,Histogram of first_ings_score,"This visualization will show the frequency distribution of first innings scores, allowing us to ..."
1,Which team has won the most matches?,Bar chart of match_winner,"This visualization will show the number of matches won by each team, allowing us to identify the..."
2,What is the distribution of margins of victory?,Histogram of margin,"This visualization will show the frequency distribution of margins of victory, allowing us to un..."
3,Which player has won the most 'Player of the Match' awards?,Bar chart of player_of_the_match,"This visualization will show the number of 'Player of the Match' awards won by each player, allo..."
4,What is the distribution of high scores?,Histogram of highscore,"This visualization will show the frequency distribution of high scores, allowing us to understan..."
5,Which team has the highest average first innings score?,Bar chart of team1 and first_ings_score,"This visualization will show the average first innings score for each team, allowing us to ident..."
6,What is the distribution of best bowling figures?,Histogram of best_bowling_figure,"This visualization will show the frequency distribution of best bowling figures, allowing us to ..."
7,Which team has the highest number of wickets in the second innings?,Bar chart of team2 and second_ings_wkts,"This visualization will show the number of wickets taken by each team in the second innings, all..."
8,What is the distribution of first innings wickets?,Histogram of first_ings_wkts,"This visualization will show the frequency distribution of first innings wickets, allowing us to..."
9,Which team has the highest number of wins in the playoffs stage?,Bar chart of team1 and stage=Playoff,"This visualization will show the number of wins for each team in the playoffs stage, allowing us..."


In [None]:
# Explore Goal 1

charts = lida.visualize(summary=summary, goal=goals[0]) # exploratory data analysis
print("Charts length:", len(charts))
charts[0]

In [None]:
# Explore Goal 2
print("Goal index:", goals[1].index)
print("Goal question:", goals[1].question)
print("Goal rationale:", goals[1].rationale)
print("Goal dataviz:", goals[1].visualization)
print("\nCharts length:", len(charts))
charts = lida.visualize(summary=summary, goal=goals[1]) # exploratory data analysis
charts[0]

### 4.3 Caching
Each manager method takes a `textgen_config` argument which is a dictionary that can be used to configure the text generation process (with parameters for model, temperature, max_tokens, topk etc). One of the keys in this dictionary is `use_cache`. If set to `True`, the manager will cache the generated text associated with that method. Use for speedup and to avoid hitting API limits.

In [None]:
# !pip install lida 
# !pip install lida[infographics] # for infographics support

In [None]:
from lida import Manager, TextGenerationConfig , llm  

## 5. Summarize Data, Generate Goals

In [None]:
lida = Manager(text_gen = llm("openai", api_key=None)) # !! api key
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-3.5-turbo-0301", use_cache=True)
#csvfile = "https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv"
csvfile = "./../../data/kaggle/IPL-2022.csv"
summary = lida.summarize(csvfile, summary_method="default", textgen_config=textgen_config)  
goals = lida.goals(summary, n=5, textgen_config=textgen_config)

for goal in goals:
    display(goal)

In [None]:
# goals can also be based on a persona 
# persona = "a mechanic who wants to buy a car that is cheap but has good gas mileage"
persona = "a enthusiastic sports fan who likes to use a casual tone and loves to know all the key stats of the game"
personal_goals = lida.goals(summary, n=5, persona=persona, textgen_config=textgen_config)
for goal in personal_goals:
    display(goal)

## 6. Generate Visualizations

In [None]:
# There are 5 goals above
# Visualizations worked for i=0, 2, 3
i = 0
print(goals[i])

library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)  
charts[0]

In [None]:
# There are 5 goals above
# Visualizations worked for i=0, 2, 3
i = 2
print(goals[i])

library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)  
charts[0]

In [None]:
# There are 5 goals above
# Visualizations worked for i=0, 2, 3
i = 3
print(goals[i])

library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)  
charts[0]

### 6.1 Generate visualization via a "user query"   

In [None]:
#user_query = "What is the average price of cars by type?"
user_query = "What is the average runs scored by a team in a given stadium?"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=user_query, textgen_config=textgen_config)  
charts[0]

In [None]:
user_query = "Who won the most cricket games?"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=user_query, textgen_config=textgen_config)  
charts[0]

# 7. VizOps

Given that LIDA represents visualizations as code,
the VISGENERATOR also implements submodules
to perform operations on this representation. 

This includes 
- **Natural language based visualization refinement**: Provides a conversational api to iteratively
4Execution in a sandbox environment is recommended.
refine generated code (e.g., translate chart t hindi
. . . zoom in by 50% etc) which can then be executed to generate new visualizations.
- **Visualization explanations and accessibility**:
Generates natural language explanations (valuable
for debugging and sensemaking) as well as accessibility descriptions (valuable for supporting users
with visual impairments).

- **Visualization code self-evaluation and repair**:
Applies an LLM to self-evaluate generated code on
multiple dimensions (see section 4.1.2).

- **Visualization recommendation**: Given some context (goals, or an existing visualization), recommend additional visualizations to the user (e.g., for
comparison, or to provide additional perspectives).



### 7.1 Natural language based visualization refinement 

Given some code, modify it based on natural language instructions. This yields a new code snippet that can be executed to generate a new visualization.

In [None]:
code = charts[0].code
textgen_config = TextGenerationConfig(n=1, temperature=0, use_cache=True)
instructions = ["make the chart height and width equal", "change the color of the chart to red", "translate the chart to spanish"]
edited_charts = lida.edit(code=code,  summary=summary, instructions=instructions, library=library, textgen_config=textgen_config)
edited_charts[0]

### 7.2 Visualization explanations and accessibility

In [None]:
explanations = lida.explain(code=code, library=library, textgen_config=textgen_config) 
for row in explanations[0]:
    print(row["section"]," ** ", row["explanation"])

### 7.3 Visualization code self-evaluation and repair

In [None]:
evaluations = lida.evaluate(code=code,  goal=goals[i], textgen_config=textgen_config, library=library)[0] 
for eval in evaluations:
    print(eval["dimension"], "Score" ,eval["score"], "/ 10")
    print("\t", eval["rationale"][:200])
    print("\t**********************************")

## Visualization Recommendation

In [None]:
textgen_config = TextGenerationConfig(n=2, temperature=0.2, use_cache=True)
recommended_charts =  lida.recommend(code=code, summary=summary, n=2,  textgen_config=textgen_config)

In [None]:
print(f"Recommended {len(recommended_charts)} charts")
for chart in recommended_charts:
    display(chart) 

## Infographics (Beta)

- Explores using LIDA to generate infographics from an existing visualization 
- Uses the `peacasso` package, and loads open source stable diffusion models 
- You will need to run `pip install lida[infographics]` to install the required dependencies.
- Currently work in progress (work being done to post process infographics with chart axis and title overlays from the original visualization, add presets for different infographic styles, and add more stable diffusion models)


In [None]:
# !pip install lida[infographics] 
# ensure you have a GPU runtime

In [None]:
# 🚨 | Uncomment below to try it out only if you have access to 
#      a GPU Runtime in your GitHub Codespaces or Docker Desktop

# infographics = lida.infographics(visualization = edited_charts[0].raster, n=1, style_prompt="pastel art, green pearly rain drops, highly detailed, no blur, white background")

In [None]:
# 🚨 | Uncomment below to try it out only if you have access to 
#      a GPU Runtime in your GitHub Codespaces or Docker Desktop
#      and successfully ran the previous cell
from lida.utils import plot_raster
# plot_raster([edited_charts[0].raster, infographics["images"][0]]) 