---
layout: post
title:  "Rating Tags in Works Part II"
date:   2021-05-28
categories: data_cleaning visualization
tags: Python Pandas 
---

In part II, we are going to create animated charts displaying ratings on AO3.

* Table of Contents
{:toc}

# Loading File

In part I, we created a csv file named rating.csv. We'll load the file here along with works-20210226.csv from AO3 official data dump.

In [2]:
# Load python libraries
import pandas as pd

In [3]:
# Load works file
works= pd.read_csv("/home/pi/Downloads/works-20210226.csv")

In [3]:
# Load rating.csv from part I
rating = pd.read_csv("rating.csv")

In [4]:
# preview file
rating

Unnamed: 0,id,type,name,canonical,cached_count,merger_id
0,9,Rating,Not Rated,True,825385,
1,10,Rating,General Audiences,True,2115153,
2,11,Rating,Teen And Up Audiences,True,2272688,
3,12,Rating,Mature,True,1151260,
4,13,Rating,Explicit,True,1238331,
5,12766726,Rating,Teen & Up Audiences,False,333,


In [4]:
# preview file
works

Unnamed: 0,creation date,language,restricted,complete,word_count,tags,Unnamed: 6
0,2021-02-26,en,False,True,388.0,10+414093+1001939+4577144+1499536+110+4682892+...,
1,2021-02-26,en,False,True,1638.0,10+20350917+34816907+23666027+23269305+2326930...,
2,2021-02-26,en,False,True,1502.0,10+10613413+9780526+3763877+3741104+7657229+30...,
3,2021-02-26,en,False,True,100.0,10+15322+54862755+20595867+32994286+663+471751...,
4,2021-02-26,en,False,True,994.0,11+721553+54604+1439500+3938423+53483274+54862...,
...,...,...,...,...,...,...,...
7269688,2008-09-13,en,True,True,705.0,78+77+84+101+104+105+106+23+13+16+70+933,
7269689,2008-09-13,en,False,True,1392.0,78+77+84+107+23+10+16+70+933+616,
7269690,2008-09-13,en,False,True,1755.0,77+78+69+108+109+62+110+23+9+111+16+70+10128+4858,
7269691,2008-09-13,en,False,True,1338.0,112+113+13+114+16+115+101+117+118+119+120+116+...,


# Data Cleaning

In part I, we went through detailed steps of how to clean and prepare the works DataFrame for visualization. Here we'll skip the explanation.

In [5]:
# Drop NA value in tags column
works = works.dropna(subset = ['tags'])

In [6]:
# Function to find the mimnimum value in the string, and return that value
def find_rating(x):
    return min([int(n) for n in x.split('+')])

In [7]:
# Create a new column named 'rating'
# Use apply() to apply a function to each row
works['rating'] = works['tags'].apply(lambda x: find_rating(x))

Unnamed: 0,creation date,language,restricted,complete,word_count,tags,Unnamed: 6,rating
0,2021-02-26,en,False,True,388.0,10+414093+1001939+4577144+1499536+110+4682892+...,,10
1,2021-02-26,en,False,True,1638.0,10+20350917+34816907+23666027+23269305+2326930...,,10
2,2021-02-26,en,False,True,1502.0,10+10613413+9780526+3763877+3741104+7657229+30...,,10
3,2021-02-26,en,False,True,100.0,10+15322+54862755+20595867+32994286+663+471751...,,10
4,2021-02-26,en,False,True,994.0,11+721553+54604+1439500+3938423+53483274+54862...,,11
...,...,...,...,...,...,...,...,...
7269688,2008-09-13,en,True,True,705.0,78+77+84+101+104+105+106+23+13+16+70+933,,13
7269689,2008-09-13,en,False,True,1392.0,78+77+84+107+23+10+16+70+933+616,,10
7269690,2008-09-13,en,False,True,1755.0,77+78+69+108+109+62+110+23+9+111+16+70+10128+4858,,9
7269691,2008-09-13,en,False,True,1338.0,112+113+13+114+16+115+101+117+118+119+120+116+...,,13


There are 488 works with no rating. This is actually a [know issue](https://otwarchive.atlassian.net/browse/AO3-6065) that the volunteers are actively working on behind-the-curtain. Thus, we simply drop these works from our data set for now. 

In [21]:
# Drop works with no rating
works = works[works['rating'].isin(rating['id'])]