Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel $\rightarrow$ Restart) and then **run all cells** (in the menubar, select Cell $\rightarrow$ Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and email below:

In [None]:
# Full name
NAME = ""
# Institutional email (hm.edu or hmtm.de)
EMAIL = ""

---

# Visualizing conversation data

+ **AI in Culture and Arts - Tech Crash Course**
+ **Date:** 13.06.2024
+ **Author:** Lenny Martinez Dominguez, Ph.D candidate at Sorbonne Université

<a href="https://colab.research.google.com/github/aica-wavelab/aica-assignments/blob/main/A6_conversation_analysis_and_visualization/2_conversation_visualization.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## 1. Introduction

In this notebook we will explore conversation analysis through visualization. 


### Assignment

Sketch and implement a creative data visualization of a conversation of your choice. You can use the provided data (Romeo and Juliet) or import a private What's app conversation of your choice. 

<div class="alert alert-block alert-warning">
<b>Instruction:</b> Do not share any of your conversation data in the assigment! Please consider the privacy of the people involved in the conversation and look at anonymization techniques from the previous notebook.
</div>

### Installation requirements
Execute the following cell to install the packages needed to work

In [None]:
!pip install seaborn matplotlib plotly numpy pandas

---

## 2. Data visualization examples:
Below are some examples we'll walk through together (marked with `*`) and could serve as inspiration for the kinds of visualizations you could make. A lot of is interactive, and most of it does take its own time in one way or another.

### Simple charts can tell a story:
- [Who is the Biggest Pop Star?](https://pudding.cool/2019/03/pop-music/)*
- [Graphics for the Publications Office of the EU](https://www.behance.net/gallery/188704943/Graphics-for-the-Publications-Office-of-the-EU)*
- [The Rising of Olympic Mountains](https://projects.christianlaesser.com/olympics/)*
- [Travel Visa Inequality](https://projects.christianlaesser.com/travel-visa-inequality/)

### You can visualize your entire life, if you wanted
- Nick Felton's [2013 Annual Report](http://feltron.com/FAR13.html)*

### You can look to other parts of the world:
- [2024 European elections results: Explore our map and view the make-up of the future Parliament](https://www.lemonde.fr/en/international/article/2024/06/09/2024-european-elections-results-explore-our-map-and-view-the-make-up-of-the-future-parliament_6674304_4.html)*
- [Mona Chalabi on storytelling, the power of data, and covering Palestine](https://www.theverge.com/24093294/mona-chalabi-interview-palestine-gaza-data-viz)

### Sometimes animation is good for getting a point across
- ["Land doesn't vote, people do! French edition."](https://x.com/karim_douieb/status/1800777148871188766)*

### Some visualizations take time to understand and read
- [La Lettura - VISUAL DATA](https://www.flickr.com/photos/accurat/albums/72157632185046466/)*

### Visualizations can be pretty and colorful!
- [Multiplicity: A collective photographic city portrait](https://truth-and-beauty.net/projects/multiplicity)
- [Why do cats & dogs ...?](https://whydocatsanddogs.com/)*

---

## 3. Visualization ideas

### Starting from the dataset

In the previous notebook we computed and logged the following features that we can use when creating visualizations:

- `char_count` : the length of each message in characters
- `question_count`: the number of questions in a message
- `sentiment_score`: positivity or negativity score of a message (between -1 and 1)

For whatsapp conversation, we can also implement:
- `time_diff_seconds` (whatsapp only): the time difference in seconds from the previous message
- `media_sent` (whatsapp only): the presence of media (audio, file, document, photo, sticker, video) in a message
- `emoji_count` (whatsapp only): the number of emojis in a message

### Leading with questions

In data visualization, it's best to lead with curiosity and make visualizations that help answer questions. With these columns and the types of conversations we have (the play or a conversation with a friend), some questions might be:

- How long do we take to respond to each other? (Whatsapp data)
- Who talks the most? Who sends the most messages?
- Who sends the longest messages? What are those messages about? 
- Is someone more negative or positive than the other? Is there any time that pattern flips?
- What's the most common word we use? When and how often do we each use it?

These are all questions that can be answered in code and spreadsheets, but we can also answer them using simple graphics like _bar charts_, _line charts_, _histograms_, and _scatterplots_.

<div class="alert alert-info">
<b>Instruction:</b> Let's spend some time looking at your analyzed data (maybe in excel!). Come up with three questions you may want to know about the data. We'll discuss as a group afterwards.
</div>

YOUR ANSWER HERE

### What do you need to know to answer the question?

After you decide on some questions, it's important to make sure you have data (or can collect/compute data) that can answer your question. For each of the questions you chose, identify what columns from the dataset you need to answer the question.

<div class="alert alert-info">
<b>Instruction:</b> Let's spend some time looking at your analyzed data (maybe in excel!) and the questions you chose. For each question, identify what information you have to answer it and what additional information (grouping or filtering data are common activities we need to do, even if we have the column) you may need.
</div>

YOUR ANSWER HERE

### Ideate and Sketch

Now that you have questions, and you have an idea of what parts of the dataset you need to answer them, it's a good time to sketch out the kind of graphics you might want to make.

#### _Data to Viz_ & _Python Data Gallery_
Data visualizations can be used to explore Distributions, Correlations, Rankings, Parts of a whole, Evolution, Flows, as well as geography through Maps. [Data to Viz](https://www.data-to-viz.com/) is a good resource for learning more about the different kinds of visualizations that fit into these categories, as well as getting directed to code for making these kinds of graphics in different programming languages. A python specific version is the [Python Graph Gallery](https://python-graph-gallery.com/).

For today, these particular chart types might be good enough to focus on:
- [Histogram](https://python-graph-gallery.com/histogram/)
- [Scatterplot](https://python-graph-gallery.com/scatter-plot/) and [Bubble plot](https://python-graph-gallery.com/bubble-plot/)
- [Lollipop plot](https://python-graph-gallery.com/lollipop-plot/) and [Bar chart](https://python-graph-gallery.com/barplot/)
- [Line Chart](https://python-graph-gallery.com/line-chart/)

<div class="alert alert-info">
<b>Instruction:</b> Let's spend some time ideating and sketching different graphics for the questions you want to answer. For each question, sketch at least two different graphics from the list I shared above. How do you think your charts would look like with the data you have? Below, write what charts you tried sketching for each question
</div>

YOUR ANSWER HERE

---

## 4. Example visualizations

In this part I'll show you how I would create graphics to each of the following questions:

1. Who sends the longest messages?
1. Who asks the most questions?
1. Is someone more negative or positive than the other? Is there any time that pattern flips?
1. How long do we take to respond to each other? (Whatsapp data)

First we want to load our data and import the libraries

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

conversation_df = pd.read_csv('data/anonymized_conversation_features.csv')
conversation_df.head()

Now we will use Seaborn to build the visualizations.

### Q1: Who sends the longest messages?

We can use a histogram to answer this question. We'll need the `char_count` since that is the message length, and we'll need the `sender` to count things by speaker.

I used the documentation on the Seaborn website about [`.histplot()`](https://seaborn.pydata.org/generated/seaborn.histplot.html) to add extra features to make the graphic. more legible.

In [None]:
plt.figure(figsize=(12, 6))
sns.histplot(data=conversation_df, x='char_count', hue='sender', element="step", alpha=0.4)

### Q2: Who asks the most questions?

Let's try a barplot to answer this question. We'll need the `question_count` column (or feature), as well as the `sender` to plot this.

Here's the Seaborn page for [`.barplot()`](https://seaborn.pydata.org/generated/seaborn.barplot.html).

In [None]:
plt.figure(figsize=(12, 6))
sns.barplot(
    data=conversation_df, x="char_count", y="sender", hue="sender", errorbar=None, palette="viridis"
)

### Q3: Is someone more negative or positive than the other? Is there any time that pattern flips?

Let's try a scatterplot to answer this question. We'll need the `sentiment_score`feature, as well as the `date_time`, and `sender` features to plot this.

Here's the Seaborn page for [`.scatterplot()`](https://seaborn.pydata.org/generated/seaborn.scatterplot.html).

In [None]:
plt.figure(figsize=(12, 6))
sns.scatterplot(data=conversation_df, x="date_time", y="sentiment_score", hue="sender")

It's hard to tell anything from this scatterplot. Every message is quite positive. Could we try making the size match the length of the message? A short very positive message might be different from a long very positive message.

In [None]:
plt.figure(figsize=(12, 6))
sns.scatterplot(data=conversation_df, x="date_time", y="sentiment_score", hue="sender", size="char_count", sizes=(20, 200))

We could say it's pretty but it's not easy to see how sentiment changes over time. For that we need to really connect the dots in the scatterplot. Let's try a line chart (or [lineplot](https://seaborn.pydata.org/generated/seaborn.lineplot.html) as Seaborn calls them).

In [None]:
plt.figure(figsize=(12, 6))
sns.lineplot(
    data=conversation_df,
    x="date_time",
    y="sentiment_score",
    hue="sender",
    marker="o"
)

Now we can se just how the sentiment changes over time. We can try to overlay the size of the dots as well:

In [None]:
plt.figure(figsize=(14, 10))
sns.lineplot(
    data=conversation_df, x="date_time", y="sentiment_score", hue="sender")

sns.scatterplot(
    data=conversation_df,
    x="date_time",
    y="sentiment_score",
    hue="sender",
    size="char_count",
    sizes=(20, 200),
)

---

## 5. Try it yourself

Make three graphics using your conversation data. Write out the question you are trying to answer with each graphic, as well as what data columns you need. Then when you make the graphic, write your interpretation of it, and what your answer to the question would be.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()