## Introduction

![shayna-douglas-TQV8qkwuEzA-unsplash.jpg](attachment:f9df4339-100e-4432-929e-c15edb18ba22.jpg)

I'm sure I'm not the only girl in the world, whose favourite Harry Potter character ever is Hermione, that curly smartass bookworm, who nevertheless loves her friends above everything :) 

In this notebook, I'm playing around with a very comprehensive [Harry Potter Movies Dataset](https://www.kaggle.com/datasets/maricinnamon/harry-potter-movies-dataset) and try to get some interesting insights about Hermione. Join me on the Platform 9 3/4, we're starting off!

## Starting Off: Environment and Setup

First things first: let us do the necessary imports.

I will use:
* [pandas](https://pandas.pydata.org/) and [NumPy](https://numpy.org/) for data processing;
* [Plotly Express](https://plotly.com/python/plotly-express/), [Plotly Graph Objects](https://plotly.com/python/graph-objects/) and [matplotlib](https://matplotlib.org/) for plotting;
* [NLTK](https://www.nltk.org/) for comfortable language processing;
* [WordCloud](https://amueller.github.io/word_cloud/index.html) to help with cloud creation;
* [collections](https://docs.python.org/3/library/collections.html) to unlock some useful datatypes.

In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt


I also will use X11 colour table (copied from [Pillow docs](https://pillow.readthedocs.io/en/stable/_modules/PIL/ImageColor.html)). Pasting in the next cell, but hiding not to disturb reading. 

In [2]:
colormap = {
        # X11 colour table from https://drafts.csswg.org/css-color-4/, with
        # gray/grey spelling issues fixed.  This is a superset of HTML 4.0
        # colour names used in CSS 1.
        "aliceblue": "#f0f8ff",
        "antiquewhite": "#faebd7",
        "aqua": "#00ffff",
        "aquamarine": "#7fffd4",
        "azure": "#f0ffff",
        "beige": "#f5f5dc",
        "bisque": "#ffe4c4",
        "black": "#000000",
        "blanchedalmond": "#ffebcd",
        "blue": "#0000ff",
        "blueviolet": "#8a2be2",
        "brown": "#a52a2a",
        "burlywood": "#deb887",
        "cadetblue": "#5f9ea0",
        "chartreuse": "#7fff00",
        "chocolate": "#d2691e",
        "coral": "#ff7f50",
        "cornflowerblue": "#6495ed",
        "cornsilk": "#fff8dc",
        "crimson": "#dc143c",
        "cyan": "#00ffff",
        "darkblue": "#00008b",
        "darkcyan": "#008b8b",
        "darkgoldenrod": "#b8860b",
        "darkgray": "#a9a9a9",
        "darkgrey": "#a9a9a9",
        "darkgreen": "#006400",
        "darkkhaki": "#bdb76b",
        "darkmagenta": "#8b008b",
        "darkolivegreen": "#556b2f",
        "darkorange": "#ff8c00",
        "darkorchid": "#9932cc",
        "darkred": "#8b0000",
        "darksalmon": "#e9967a",
        "darkseagreen": "#8fbc8f",
        "darkslateblue": "#483d8b",
        "darkslategray": "#2f4f4f",
        "darkslategrey": "#2f4f4f",
        "darkturquoise": "#00ced1",
        "darkviolet": "#9400d3",
        "deeppink": "#ff1493",
        "deepskyblue": "#00bfff",
        "dimgray": "#696969",
        "dimgrey": "#696969",
        "dodgerblue": "#1e90ff",
        "firebrick": "#b22222",
        "floralwhite": "#fffaf0",
        "forestgreen": "#228b22",
        "fuchsia": "#ff00ff",
        "gainsboro": "#dcdcdc",
        "ghostwhite": "#f8f8ff",
        "gold": "#ffd700",
        "goldenrod": "#daa520",
        "gray": "#808080",
        "grey": "#808080",
        "green": "#008000",
        "greenyellow": "#adff2f",
        "honeydew": "#f0fff0",
        "hotpink": "#ff69b4",
        "indianred": "#cd5c5c",
        "indigo": "#4b0082",
        "ivory": "#fffff0",
        "khaki": "#f0e68c",
        "lavender": "#e6e6fa",
        "lavenderblush": "#fff0f5",
        "lawngreen": "#7cfc00",
        "lemonchiffon": "#fffacd",
        "lightblue": "#add8e6",
        "lightcoral": "#f08080",
        "lightcyan": "#e0ffff",
        "lightgoldenrodyellow": "#fafad2",
        "lightgreen": "#90ee90",
        "lightgray": "#d3d3d3",
        "lightgrey": "#d3d3d3",
        "lightpink": "#ffb6c1",
        "lightsalmon": "#ffa07a",
        "lightseagreen": "#20b2aa",
        "lightskyblue": "#87cefa",
        "lightslategray": "#778899",
        "lightslategrey": "#778899",
        "lightsteelblue": "#b0c4de",
        "lightyellow": "#ffffe0",
        "lime": "#00ff00",
        "limegreen": "#32cd32",
        "linen": "#faf0e6",
        "magenta": "#ff00ff",
        "maroon": "#800000",
        "mediumaquamarine": "#66cdaa",
        "mediumblue": "#0000cd",
        "mediumorchid": "#ba55d3",
        "mediumpurple": "#9370db",
        "mediumseagreen": "#3cb371",
        "mediumslateblue": "#7b68ee",
        "mediumspringgreen": "#00fa9a",
        "mediumturquoise": "#48d1cc",
        "mediumvioletred": "#c71585",
        "midnightblue": "#191970",
        "mintcream": "#f5fffa",
        "mistyrose": "#ffe4e1",
        "moccasin": "#ffe4b5",
        "navajowhite": "#ffdead",
        "navy": "#000080",
        "oldlace": "#fdf5e6",
        "olive": "#808000",
        "olivedrab": "#6b8e23",
        "orange": "#ffa500",
        "orangered": "#ff4500",
        "orchid": "#da70d6",
        "palegoldenrod": "#eee8aa",
        "palegreen": "#98fb98",
        "paleturquoise": "#afeeee",
        "palevioletred": "#db7093",
        "papayawhip": "#ffefd5",
        "peachpuff": "#ffdab9",
        "peru": "#cd853f",
        "pink": "#ffc0cb",
        "plum": "#dda0dd",
        "powderblue": "#b0e0e6",
        "purple": "#800080",
        "rebeccapurple": "#663399",
        "red": "#ff0000",
        "rosybrown": "#bc8f8f",
        "royalblue": "#4169e1",
        "saddlebrown": "#8b4513",
        "salmon": "#fa8072",
        "sandybrown": "#f4a460",
        "seagreen": "#2e8b57",
        "seashell": "#fff5ee",
        "sienna": "#a0522d",
        "silver": "#c0c0c0",
        "skyblue": "#87ceeb",
        "slateblue": "#6a5acd",
        "slategray": "#708090",
        "slategrey": "#708090",
        "snow": "#fffafa",
        "springgreen": "#00ff7f",
        "steelblue": "#4682b4",
        "tan": "#d2b48c",
        "teal": "#008080",
        "thistle": "#d8bfd8",
        "tomato": "#ff6347",
        "turquoise": "#40e0d0",
        "violet": "#ee82ee",
        "wheat": "#f5deb3",
        "white": "#ffffff",
        "whitesmoke": "#f5f5f5",
        "yellow": "#ffff00",
        "yellowgreen": "#9acd32",
    }

## Hi Hermione, Nice to Meet You!

Now, let's say hi to Hermione!

I will read the csv file containing basic info about the characters to get the first impression, and select info about Hermione:

In [4]:
characters = pd.read_csv("/Harry_Potter_Movies/Characters.csv", 
                         encoding = "ISO-8859-1")
characters[characters['Character Name'] == 'Hermione Granger']

FileNotFoundError: [Errno 2] No such file or directory: '/Harry_Potter_Movies/Characters.csv'

Whew, while I of course knew that Hermione is a girl from Gryffindor, I have nearly forgot that Hermione has an awesome  **vine** wand! 

According to [J. K. Rowling](https://www.wizardingworld.com/writing-by-jk-rowling/wand-woods) herself:
> *Vine wands are among the less common types, and I have been intrigued to notice that their owners are nearly always those witches or wizards who seek a greater purpose, who have a vision beyond the ordinary and who frequently astound those who think they know them best. Vine wands seem strongly attracted by personalities with hidden depths, and I have found them more sensitive than any other when it comes to instantly detecting a prospective match.* 

Does not this description match Hermione's personality precisely?

![andrew-scherle-1sr980fmPks-unsplash.jpg](attachment:abb5073f-afc3-4ce2-bc94-6cafa83662cf.jpg)

## Hermione, How About Small Talk?

I'm very curious to get to know Hermione better. For that, I will get the table of all dialogues, merge with characters table and select everything said by Hermione in all Harry Potter movies into a single string.

Then, I'll do some cleanup. It is obvious that Hermione talks a lot with her friends and family, so I'll remove the first names of most popular characters from the text. Also, I'll do tokenization and remove the basic stopwords (thanks NLTK for supporting me here!). And finally, I'll remove some punctuation and bring all cleaned words together. 

With that, I can build words frequencies dictionary and tadaaaah - make a words cloud out of it!

In [None]:
dialogues = pd.read_csv('/kaggle/input/harry-potter-movies-dataset/Harry_Potter_Movies/Dialogue.csv', 
                         encoding = "ISO-8859-1")
dialogues.head()

In [None]:
dialogues_and_characters = dialogues.merge(characters, how='left', left_on=['Character ID'], 
                                           right_on=['Character ID'])
h_dialogues = ' '.join(
    dialogues_and_characters.loc[dialogues_and_characters['Character Name'] == 'Hermione Granger']['Dialogue'])

first_names = []
for character in characters['Character Name'].to_list()[:len(characters['Character Name'].to_list())//2]:
    if len (character.split(' ')) > 1:
        first_names.append(character.split(' ')[0])
        
stopwords = nltk.corpus.stopwords.words('english')

h_dialogues_list = []
for token in word_tokenize(h_dialogues):
    if ((len(token) != 1) and ('...' not in token) and ("'" not in token)) and \
                              (":" not in token) and ("--" not in token) and \
                              ("Dumbledore" not in token) and \
                              (token.lower() not in stopwords) and (token not in first_names):
        h_dialogues_list.append(token)
        
words_frequencies = dict(Counter(h_dialogues_list))

In [None]:
words_cloud = WordCloud(background_color='black', width=800, height=400, colormap='Set2',
                        max_words=2000).generate_from_frequencies(words_frequencies)
        
plt.figure(figsize=[15,15])
plt.imshow(words_cloud, interpolation = 'bilinear')
plt.axis("off")
plt.show()

Now, what are the interesting insights here?

Hermione is all about studying hard, reading and generating bright and smart ideas. So of course her most-used words are: **"know"** and **"think"**! And **"Professor"**, **"course"**, **"thought"** and **"wand"** are also prominent. Also, Hermione is always **"going"** to do **"something"** **"well"**, **"got"** her own **"secrets"** and **"could"** make any **"Potion"** for **"us"**! 



## Hermione, Will You Enchant Me?

As we know, Hermione is a powerful and gifted witch, and knows many spells. Can we figure out which spell she pronounces most often in the movies? Moreover, some spells produce colored light - maybe we can find out also which light comes out of the vine wand most often?



### Spell pronounced most often by Hermione

First, I will read out csv file containing information about spells. Then I will select charm incantations from the list of words, pronounced by Hermione and count their frequencies.

Now we have to match the spells to the colors. Some spells do not produce light at all - I will use grey color to mark them. Some have colors matching the standard CSS color names - I will map them in that case. Also, there are a couple of exotic colors which I have to map manually.

And finally, with all colors set and frequencies calculated, we can build the resulting chart!

In [None]:
spells = pd.read_csv('/kaggle/input/harry-potter-movies-dataset/Harry_Potter_Movies/Spells.csv')
spells.head()

In [None]:
pronounced_spells = []
for word in h_dialogues_list:
    if word.lower() in [spell.lower() for spell in spells['Incantation'].tolist()]:
        pronounced_spells.append(word.capitalize())
spells_frequencies = dict(Counter(pronounced_spells))
spells['Count'] = spells['Incantation'].map(spells_frequencies)
h_spells = spells[spells['Incantation'].isin(spells_frequencies)].copy()

conditions = [(h_spells['Light'].str.lower().isin(colormap.keys())),
              (h_spells['Light'] == 'Fiery Orange'),
              (h_spells['Light'] == 'Scarlet'),
              (h_spells['Light'].isna())]
values = [h_spells['Light'].str.lower(), '#ff6700', '#ff2400', '#d3d3d3']
h_spells['Color'] = np.select(conditions, values)

In [None]:
colors = h_spells['Color']
fig = go.Figure(data=[go.Pie(labels=h_spells['Incantation'],
                             values=h_spells['Count'])])
fig.update_traces(hoverinfo='label+value+percent', textinfo='value', textfont_size=15,
                  marker=dict(colors=colors, line=dict(color='gray', width=0.5)))

fig.update_layout(title="Spells Pronounced by Hermione by Frequency",
                  title_x=0.45,
                  showlegend=True)

fig.show()


So, Hermione uses **'Lumos!'** most often (three times, to be precise)! And of course it produces white light to illuminate everything around.

Also, Hermione used many spells twice: **'Alohomora'** to unlock something, **'Confringo'** to cause explosions, **'Finite'** to terminate other spells, **'Obliviate'** to erase memories, **'Relashio'** to make someone release hold, and **'Reparo'** to fix things.

### Most Common Light of Hermione's Spells

In [None]:
h_spells_colors = h_spells.groupby(['Color'], as_index=False)[['Incantation', 'Count']].agg(
                  {'Incantation': ', '.join, 'Count': 'sum'})
colors = h_spells_colors['Color']
fig = go.Figure(data=[go.Pie(labels=h_spells_colors['Incantation'],
                             values=h_spells_colors['Count'])])
fig.update_traces(hoverinfo='label+value+percent', textinfo='value', textfont_size=15,
                  marker=dict(colors=colors, line=dict(color='gray', width=0.5)))

fig.update_layout(title="Colors of Spells Pronounced by Hermione",
                  title_x=0.45,
                  showlegend=True)
fig.show()

So, 29.2% of spells pronounced by Hermione do not have colored light. 

The most popular light color is blue - 16.7% of the spells produce it: those are 'Alohomora', 'Immobulus' and 'Revelio'. 

Scarlet and orange lights are least popular and occur only once (4.17%) each.

And what is your favourite spell? :)

![josh-calabrese-_SmO8cRFduA-unsplash.jpg](attachment:9ac3e9e2-3977-4c8d-bd92-48f8713ce8ce.jpg)

## Most Prominent Movie for Hermione

Now, I want to figure out the movies, where Hermione has talked most and least.

To do this, I will need chapters table (because our dialogue table uses chapters key) and movies table to match the chapters to movies by ID.

Let us have a look at both:


In [None]:
chapters = pd.read_csv('/kaggle/input/harry-potter-movies-dataset/Harry_Potter_Movies/Chapters.csv', 
                         encoding = "ISO-8859-1")
chapters.head()

In [None]:
movies = pd.read_csv('/kaggle/input/harry-potter-movies-dataset/Harry_Potter_Movies/Movies.csv')
movies.head()

In [None]:
chapters_and_characters = dialogues_and_characters.merge(chapters, how='left', left_on=['Chapter ID'], right_on=['Chapter ID'])

movies_and_characters = chapters_and_characters.merge(movies, how='left', left_on=['Movie ID'], right_on=['Movie ID'])
h_movies = movies_and_characters.loc[movies_and_characters['Character Name'] == 'Hermione Granger']
h_movies_count = h_movies.groupby(['Movie Title']).count().reset_index()
h_movies_count = h_movies_count[['Movie Title', 'Runtime']]
h_movies_count.rename(columns={'Runtime':'Count'}, inplace=True)

In [None]:
colors = color_discrete_sequence=px.colors.qualitative.G10

In [None]:
labels = h_movies_count['Movie Title']
values = h_movies_count['Count']

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.2,
                            marker=dict(colors=colors))])
fig.update_traces(hoverinfo='label+value+percent', textinfo='value', textfont_size=15)

fig.update_layout(title="Movies by Number of Hermione's Dialogs",
                  title_x=0.45,
                  showlegend=True)
fig.show()

## Hermione, Let Us Go Out!

Finally, I would like to know in which locations we can detect Hermione. One important remark here: for sure, Hogwarts, as well as the train and other premises, is the main stage in all the movies, and for sure Hermione spends most of here time there. So I will exclude Hogwarts category from my charts just to see better the situation for all other cool spots.

I will start with loading the places table and merging it with Hermione's movies:

In [None]:
places = pd.read_csv('/kaggle/input/harry-potter-movies-dataset/Harry_Potter_Movies/Places.csv')
places.head()

In [None]:
h_places = h_movies.merge(places, how='left', left_on=['Place ID'], right_on=['Place ID'])
h_places = h_places[~h_places['Place Category'].isin(['Hogwarts'])]
h_places = h_places[~h_places['Place Name'].isin(['Unknown'])]

### Hermiones Favourite Spots

Here, I will visualise locations where Hermione spent most of her time - at least movie time.

In [None]:
h_places_names = h_places.groupby(['Place Name']).count().reset_index()[['Place Name', 'Runtime']]
h_places_names.rename(columns={'Runtime':'Count'}, inplace=True)

labels = h_places_names['Place Name']
values = h_places_names['Count']

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.2,
                             marker=dict(colors=colors))])
fig.update_traces(hoverinfo='label+value+percent', textinfo='value', textfont_size=15)

fig.update_layout(title="Hermione and Her Favourite Spots",
                  title_x=0.45,
                  showlegend=True)

And the winner is... of course, it is **'12 Grimmauld Place'**, the headquarters for the Order of the Phoenix, with 17.8% of all non-Hogwarts appearances of Hermione! That's 49 dialogues!

The second very important location is **Forest of Dean**, with 17% of appearances.

At the Quidditch World Cup, Platform Nine and Three Quarters and Malfoy Manor Hermione is not in the spotlight :)

### Hermiones Favourite Types of Spots

We have already identified where Hermione spent most of her time when out of Hogwarts. How about categorizing that?

We will need a different grouping and another visual.

In [None]:
h_places_cats = h_places.groupby(['Place Category']).count().reset_index()[['Place Category', 'Runtime']]
h_places_cats.rename(columns={'Runtime':'Count'}, inplace=True)

labels = h_places_cats['Place Category']
values = h_places_cats['Count']

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.2,
                             marker=dict(colors=colors))])
fig.update_traces(hoverinfo='label+value+percent', textinfo='value', textfont_size=15)

fig.update_layout(title="Hermione and Her Favourite Types of Locations",
                  title_x=0.45,
                  showlegend=True)

As you see, Hermione visits mostly **dwellings** (120 non-Hogwarts appearances, or 43.5%). Note however that there is a bunch of so-called "Other Magical Locations", which are pretty uncategorized.

## That's it!

Whew, we've learned a bunch about Hermione today. And at this good-bye moment, I'd love to share the [quote](https://www.inspiringquotes.us/author/5062-emma-watson) which belongs to Emma Watson, the only Hermione I can imagine now:
> “*There's nothing wrong with being afraid. It's not the absence of fear, it's overcoming it. Sometimes you've got to blast through and have faith.*”


Have faith everyone, and thanks for joining me today!

![rhii-photography-Xy6FpnFyVjo-unsplash.jpg](attachment:d880f2a2-bbf1-44c1-9031-e3e37a618e76.jpg)


## References

Photo by <a href="https://unsplash.com/@itsmaemedia?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Shayna Douglas</a> on <a href="https://unsplash.com/s/photos/books-hogwarts?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
[](https://www.wizardingworld.com/writing-by-jk-rowling/wand-woods)

Photo by <a href="https://unsplash.com/@andrewscherle?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Andrew Scherle</a> on <a href="https://unsplash.com/s/photos/vine-wood?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>

Photo by <a href="https://unsplash.com/@joshcala?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Josh Calabrese</a> on <a href="https://unsplash.com/s/photos/illuminate?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
  
Photo by [Rhii Photography](https://unsplash.com/@rhii?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/harry-potter-ravenclaw?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)