# 1. Project LIDA

LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface).Details on the components of LIDA are described in [this paper](https://arxiv.org/abs/2303.02927) - star [this project](https://aka.ms/lida/github) for updates. 

LIDA _treats visualizations as code_ and provides a clean api for generating, executing, editing, explaining, evaluating and repairing visualization code. Here are some tasks you can execute with LIDA.

- ✅ Data Summarization
- ✅ Goal Generation
- ✅ Visualization Generation
- ✅ Visualization Editing
- ✅ Visualization Explanation
- ✅ Visualization Evaluation and Repair
- ✅ Visualization Recommendation
- ✅ Infographic Generation (beta) # pip install lida[infographics]

![LIDA Modules illustrated](https://github.com/microsoft/lida/raw/main/docs/images/lidamodules.jpg)

## 1. Data Summarization
Given a dataset, generate a compact summary of that data in a compact natural language representation that serves as context for subsequent tasks. The goal of the summarizer is to _produce an dense-but-compact information summary for a given dataset that is useful as grounding context for visualization tasks_. The grounding context is defined as one that contains information an analyst would need to understand the dataset and the tasks that can be performed on it.

See [paper](https://arxiv.org/pdf/2303.02927.pdf) for details

In [None]:
# Setup
from lida import Manager, TextGenerationConfig , llm  

csvfile = "./../../data/kaggle/IPL-2022.csv"
lida = Manager(text_gen = llm("openai")) # palm, cohere .
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-3.5-turbo-0301", use_cache=True)

In [None]:
# Summarize
summary = lida.summarize(csvfile)
summary_data = list(summary.keys())
for keys in summary_data:
    print(keys, ":", summary[keys])

## 2. Goal Generation

Given the dataset "context" generated by the summarizer, the LLM must now _generate a question (hypothesis), a visualization (that addresses the question) and a rationale (for that visualization)_. The research found that requiring the LLM to produce a rationale led to more semantically meaningful goals.

The generation API takes these parameters - the summary, the number of goals to generate (n) and a persona (optional) that influences the tone or context for the goals generated. And the textgen_config that configures parameters for the given model.

See [paper](https://arxiv.org/pdf/2303.02927.pdf) for details

In [None]:
# generate 5 goals from the summary - with the persona is a fan of the Mumbai team
goals = lida.goals(summary, n=5, textgen_config=textgen_config, persona="fam of the Mumbai team who wants to see their stats") # exploratory data analysis

# create a list of dictionaries containing the goal information
import pandas as pd
goal_list = []
for goal in goals:
    display(goal)

In [None]:

# generate 10 goals from the summary with default persona
goals = lida.goals(summary, n=10, textgen_config=textgen_config,) # exploratory data analysis

# create a list of dictionaries containing the goal information
import pandas as pd
goal_list = []
for goal in goals:
    goal_dict = {'Question': goal.question, 'Visualization': goal.visualization, 'Rationale': goal.rationale}
    goal_list.append(goal_dict)
df = pd.DataFrame(goal_list)
display(df)

In [None]:
# Visualize A Goal 
charts = lida.visualize(summary=summary, goal=goals[0]) # exploratory data analysis
print("Charts length:", len(charts))
charts[0]

In [None]:
# Visualize a Goal - and specify a library
target = goals[2]
library = "matplotlib"
charts = lida.visualize(summary=summary, goal=target, library=library) # exploratory data analysis
charts[0]

In [None]:
# Visualize it again - and specify a different library and textgen_config (change temperature)
target = goals[2]
library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=target,library=library,textgen_config=textgen_config) # exploratory data analysis
charts[0]

In [None]:
user_query = "What is the frequency of toss decisions based on team ?"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=user_query, textgen_config=textgen_config)  
charts[0]

In [None]:
# Edit visuaization - modify using natural language
instructions = ["change the color to green", "translate the title to french"]
edited_charts = lida.edit(code=charts[0],  summary=summary, instructions=instructions)

### 4.3 Caching
Each manager method takes a `textgen_config` argument which is a dictionary that can be used to configure the text generation process (with parameters for model, temperature, max_tokens, topk etc). One of the keys in this dictionary is `use_cache`. If set to `True`, the manager will cache the generated text associated with that method. Use for speedup and to avoid hitting API limits.

In [None]:
# !pip install lida 
# !pip install lida[infographics] # for infographics support

In [None]:
from lida import Manager, TextGenerationConfig , llm  

## 5. Summarize Data, Generate Goals

In [None]:
lida = Manager(text_gen = llm("openai", api_key=None)) # !! api key
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-3.5-turbo-0301", use_cache=True)
#csvfile = "https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv"
csvfile = "./../../data/kaggle/IPL-2022.csv"
summary = lida.summarize(csvfile, summary_method="default", textgen_config=textgen_config)  
goals = lida.goals(summary, n=5, textgen_config=textgen_config)

for goal in goals:
    display(goal)

In [None]:
# goals can also be based on a persona 
# persona = "a mechanic who wants to buy a car that is cheap but has good gas mileage"
persona = "a enthusiastic sports fan who likes to use a casual tone and loves to know all the key stats of the game"
personal_goals = lida.goals(summary, n=5, persona=persona, textgen_config=textgen_config)
for goal in personal_goals:
    display(goal)

## 6. Generate Visualizations

In [None]:
# There are 5 goals above
# Visualizations worked for i=0, 2, 3
i = 0
print(goals[i])

library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)  
charts[0]

In [None]:
# There are 5 goals above
# Visualizations worked for i=0, 2, 3
i = 2
print(goals[i])

library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)  
charts[0]

In [None]:
# There are 5 goals above
# Visualizations worked for i=0, 2, 3
i = 3
print(goals[i])

library = "seaborn"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=goals[i], textgen_config=textgen_config, library=library)  
charts[0]

### 6.1 Generate visualization via a "user query"   

In [None]:
#user_query = "What is the average price of cars by type?"
user_query = "What is the average runs scored by a team in a given stadium?"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=user_query, textgen_config=textgen_config)  
charts[0]

In [None]:
user_query = "Who won the most cricket games?"
textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
charts = lida.visualize(summary=summary, goal=user_query, textgen_config=textgen_config)  
charts[0]

# 7. VizOps

Given that LIDA represents visualizations as code,
the VISGENERATOR also implements submodules
to perform operations on this representation. 

This includes 
- **Natural language based visualization refinement**: Provides a conversational api to iteratively
4Execution in a sandbox environment is recommended.
refine generated code (e.g., translate chart t hindi
. . . zoom in by 50% etc) which can then be executed to generate new visualizations.
- **Visualization explanations and accessibility**:
Generates natural language explanations (valuable
for debugging and sensemaking) as well as accessibility descriptions (valuable for supporting users
with visual impairments).

- **Visualization code self-evaluation and repair**:
Applies an LLM to self-evaluate generated code on
multiple dimensions (see section 4.1.2).

- **Visualization recommendation**: Given some context (goals, or an existing visualization), recommend additional visualizations to the user (e.g., for
comparison, or to provide additional perspectives).



### 7.1 Natural language based visualization refinement 

Given some code, modify it based on natural language instructions. This yields a new code snippet that can be executed to generate a new visualization.

In [None]:
code = charts[0].code
textgen_config = TextGenerationConfig(n=1, temperature=0, use_cache=True)
instructions = ["make the chart height and width equal", "change the color of the chart to red", "translate the chart to spanish"]
edited_charts = lida.edit(code=code,  summary=summary, instructions=instructions, library=library, textgen_config=textgen_config)
edited_charts[0]

### 7.2 Visualization explanations and accessibility

In [None]:
explanations = lida.explain(code=code, library=library, textgen_config=textgen_config) 
for row in explanations[0]:
    print(row["section"]," ** ", row["explanation"])

### 7.3 Visualization code self-evaluation and repair

In [None]:
evaluations = lida.evaluate(code=code,  goal=goals[i], textgen_config=textgen_config, library=library)[0] 
for eval in evaluations:
    print(eval["dimension"], "Score" ,eval["score"], "/ 10")
    print("\t", eval["rationale"][:200])
    print("\t**********************************")

## Visualization Recommendation

In [None]:
textgen_config = TextGenerationConfig(n=2, temperature=0.2, use_cache=True)
recommended_charts =  lida.recommend(code=code, summary=summary, n=2,  textgen_config=textgen_config)

In [None]:
print(f"Recommended {len(recommended_charts)} charts")
for chart in recommended_charts:
    display(chart) 

## Infographics (Beta)

- Explores using LIDA to generate infographics from an existing visualization 
- Uses the `peacasso` package, and loads open source stable diffusion models 
- You will need to run `pip install lida[infographics]` to install the required dependencies.
- Currently work in progress (work being done to post process infographics with chart axis and title overlays from the original visualization, add presets for different infographic styles, and add more stable diffusion models)


In [None]:
# !pip install lida[infographics] 
# ensure you have a GPU runtime

In [None]:
# 🚨 | Uncomment below to try it out only if you have access to 
#      a GPU Runtime in your GitHub Codespaces or Docker Desktop

# infographics = lida.infographics(visualization = edited_charts[0].raster, n=1, style_prompt="pastel art, green pearly rain drops, highly detailed, no blur, white background")

In [None]:
# 🚨 | Uncomment below to try it out only if you have access to 
#      a GPU Runtime in your GitHub Codespaces or Docker Desktop
#      and successfully ran the previous cell
from lida.utils import plot_raster
# plot_raster([edited_charts[0].raster, infographics["images"][0]]) 