# Data Cleaning

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
data = pd.read_csv('Data/NBCU-dataLaurel.csv')
data.head()

Unnamed: 0,imdbid,title,plot,rating,imdb_rating,metacritic,dvd_release,production,actors,imdb_votes,poster,director,release_date,runtime,genre,awards,keywords,Budget,Box Office Gross
0,tt0010323,The Cabinet of Dr. Caligari,"Hypnotist Dr. Caligari uses a somnambulist, Ce...",UNRATED,8.1,,15-Oct-97,Rialto Pictures,"Werner Krauss, Conrad Veidt, Friedrich Feher, ...",42583,https://images-na.ssl-images-amazon.com/images...,Robert Wiene,19-Mar-21,67 min,"Fantasy, Horror, Mystery",1 nomination.,expressionism|somnambulist|avant-garde|hypnosi...,18000,0
1,tt0052893,Hiroshima Mon Amour,A French actress filming an anti-war film in H...,NOT RATED,8.0,,24-Jun-03,Rialto Pictures,"Emmanuelle Riva, Eiji Okada, Stella Dassas, Pi...",21154,https://images-na.ssl-images-amazon.com/images...,Alain Resnais,16-May-60,90 min,"Drama, Romance",Nominated for 1 Oscar. Another 6 wins & 5 nomi...,memory|atomic-bomb|lovers-separation|impossibl...,88300,0
2,tt0058898,Alphaville,A U.S. secret agent is sent to the distant spa...,NOT RATED,7.2,,20-Oct-98,Rialto Pictures,"Eddie Constantine, Anna Karina, Akim Tamiroff",17801,https://images-na.ssl-images-amazon.com/images...,Jean-Luc Godard,5-May-65,99 min,"Drama, Mystery, Sci-Fi",1 win.,dystopia|french-new-wave|satire|comic-violence...,220000,46585
3,tt0074252,"Ugly, Dirty and Bad",Four generations of a family live crowded toge...,,7.9,,1-Nov-16,Compagnia Cinematografica Champion,"Nino Manfredi, Maria Luisa Santella, Francesco...",5705,https://images-na.ssl-images-amazon.com/images...,Ettore Scola,23-Sep-76,115 min,"Comedy, Drama",1 win & 2 nominations.,incest|failed-murder-attempt|poisoned-food|bap...,6590,0
4,tt0084269,Losing Ground,A comedy-drama about a Black American female p...,,6.3,,,Milestone Film & Video,"Billie Allen, Gary Bolling, Clarence Branch Jr...",132,https://images-na.ssl-images-amazon.com/images...,Kathleen Collins,1-Jun-82,86 min,"Comedy, Drama",,artist|painter|marriage|black-independent-film...,0,0


In [3]:
data.shape

(8468, 19)

These are the variables we have to work with:

imdbid: Unique Id used by IMDB to refer to the movie.

Title: Title of the movie

plot: Movie plot summary

rating: MPAA Appropriate audience rating

imdb_rating: IMDB's voters' scoring of a movie on a scale from 1-10 (10 being best)

metacritic: Metacritic movie score on a scale of 0-100 (100 being best)

dvd_release: Movie release date on DVD

production: Principle production company

actors: Lead Actors

imdb_votes: Total votes from IMDB members

poster: Movie Poster artwork

director: Movie director

release_date: Theatrical Release Date

runtime: Runtime length of movie in minutes

genre: Genre Classification

awards: Academy awards & nominations

keywords: Keywords associated with the movie

budget: Budget spent on movie production, marketing, and distribution

box office gross: Box Office Gross Returns as of 9/21/2017

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8468 entries, 0 to 8467
Data columns (total 19 columns):
imdbid              8468 non-null object
title               8468 non-null object
plot                8196 non-null object
rating              5252 non-null object
imdb_rating         7735 non-null float64
metacritic          5079 non-null float64
dvd_release         5335 non-null object
production          6758 non-null object
actors              8153 non-null object
imdb_votes          7735 non-null object
poster              7967 non-null object
director            8390 non-null object
release_date        8283 non-null object
runtime             7846 non-null object
genre               8424 non-null object
awards              5242 non-null object
keywords            6381 non-null object
Budget              8468 non-null object
Box Office Gross    8468 non-null object
dtypes: float64(2), object(17)
memory usage: 1.2+ MB


Notice how many of variables are just objects. We're going to have to deal with converting a few of these into useful types.

First, we'll start by changing release_date to a datetime-like type.

In [5]:
data['release_date'].head()

0    19-Mar-21
1    16-May-60
2     5-May-65
3    23-Sep-76
4     1-Jun-82
Name: release_date, dtype: object

In [6]:
pd.to_datetime(data['release_date'])[1],data['release_date'][1]

(Timestamp('2060-05-16 00:00:00'), '16-May-60')

Then we see that there is an issue with pandas to_datetime function. It converts very old dates back to the 19th century. Perhaps we need to use the datetime package per [this](https://stackoverflow.com/questions/16600548/how-to-parse-string-dates-with-2-digit-year).

In [7]:
import datetime
import numpy as np

dates = data['release_date']

dates = pd.to_datetime(dates)

for i in range(len(dates)):
    if dates[i].year > 2019:
        dates[i] = dates[i].replace(year = dates[i].year-100)

dates.head()

0   1921-03-19
1   1960-05-16
2   1965-05-05
3   1976-09-23
4   1982-06-01
Name: release_date, dtype: datetime64[ns]

Then we've found a way to account for Python's default pivot year.

In [8]:
data['release_date']=dates
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8468 entries, 0 to 8467
Data columns (total 19 columns):
imdbid              8468 non-null object
title               8468 non-null object
plot                8196 non-null object
rating              5252 non-null object
imdb_rating         7735 non-null float64
metacritic          5079 non-null float64
dvd_release         5335 non-null object
production          6758 non-null object
actors              8153 non-null object
imdb_votes          7735 non-null object
poster              7967 non-null object
director            8390 non-null object
release_date        8283 non-null datetime64[ns]
runtime             7846 non-null object
genre               8424 non-null object
awards              5242 non-null object
keywords            6381 non-null object
Budget              8468 non-null object
Box Office Gross    8468 non-null object
dtypes: datetime64[ns](1), float64(2), object(16)
memory usage: 1.2+ MB


It seems natural to also do the same for dvd_release.

In [9]:
dates = data['dvd_release']

dates = pd.to_datetime(dates)

for i in range(len(dates)):
    if dates[i].year > 2019:
        dates[i] = dates[i].replace(year = dates[i].year-100)

data['dvd_release'] = dates
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8468 entries, 0 to 8467
Data columns (total 19 columns):
imdbid              8468 non-null object
title               8468 non-null object
plot                8196 non-null object
rating              5252 non-null object
imdb_rating         7735 non-null float64
metacritic          5079 non-null float64
dvd_release         5335 non-null datetime64[ns]
production          6758 non-null object
actors              8153 non-null object
imdb_votes          7735 non-null object
poster              7967 non-null object
director            8390 non-null object
release_date        8283 non-null datetime64[ns]
runtime             7846 non-null object
genre               8424 non-null object
awards              5242 non-null object
keywords            6381 non-null object
Budget              8468 non-null object
Box Office Gross    8468 non-null object
dtypes: datetime64[ns](2), float64(2), object(15)
memory usage: 1.2+ MB


Next, we have several important numerical variables that are currenly in object types. First, we'll work with imdb_votes.

In [10]:
data['imdb_votes'].head()

0    42,583
1    21,154
2    17,801
3     5,705
4       132
Name: imdb_votes, dtype: object

So we need to convert imdb_votes to integers. Since there are commas in each number, we cannot simply tell pandas to treat each entry as an integer via the .astype() function. We'll first have to replace each comma with a blank, then apply the int() function. We also have to take care to ignore all of the missing values from imdb_votes as we will be dealing with those later.

In [11]:
votes = data['imdb_votes']
votes_parsed = votes[votes.str.find(',')>0].apply(lambda x: x.replace(',',''))
for i in votes_parsed.index:
    votes[i] = votes_parsed[i]
votes.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


0    42583
1    21154
2    17801
3     5705
4      132
Name: imdb_votes, dtype: object

Then, we've successfully removed all of the commas. Let's confirm that we didn't lose any datapoints along the way.

In [12]:
[len(votes), len(data['imdb_votes'])]

[8468, 8468]

In order to convert to int, we have to find a way to work around missing values. Let's replace all the missing values with -1 and then convert them back to NaN after conversion.

In [13]:
import numpy as np

votes_int = votes.fillna(-1).astype('int')
votes_int[votes_int==-1] = np.nan
votes_int.isna().sum() == votes.isna().sum()

True

Then we see that we've successfully preserved the NaN cases.

In [14]:
data['imdb_votes'] = votes_int
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8468 entries, 0 to 8467
Data columns (total 19 columns):
imdbid              8468 non-null object
title               8468 non-null object
plot                8196 non-null object
rating              5252 non-null object
imdb_rating         7735 non-null float64
metacritic          5079 non-null float64
dvd_release         5335 non-null datetime64[ns]
production          6758 non-null object
actors              8153 non-null object
imdb_votes          7735 non-null float64
poster              7967 non-null object
director            8390 non-null object
release_date        8283 non-null datetime64[ns]
runtime             7846 non-null object
genre               8424 non-null object
awards              5242 non-null object
keywords            6381 non-null object
Budget              8468 non-null object
Box Office Gross    8468 non-null object
dtypes: datetime64[ns](2), float64(3), object(14)
memory usage: 1.2+ MB


Now we have to deal with 'Budget' and 'Box Office Gross' in a similar manner.

In [15]:
data[['Budget', 'Box Office Gross']].head()

Unnamed: 0,Budget,Box Office Gross
0,18000,0
1,88300,0
2,220000,46585
3,6590,0
4,0,0


Looks like we don't have to worry about any commas in 'Budget' or in 'Box Office Gross', so the conversions will be much simplier. Upon further analysis, it turns out that there are entries that are in Euros instead of USD. We will have to parse out the 'EU' and then convert that number to USD.

In [16]:
budget = data['Budget'].astype('str')
data[budget.str.contains('EU')]

Unnamed: 0,imdbid,title,plot,rating,imdb_rating,metacritic,dvd_release,production,actors,imdb_votes,poster,director,release_date,runtime,genre,awards,keywords,Budget,Box Office Gross
2866,tt4538016,Unless,A writer struggles with her daughter's decisio...,,5.8,,NaT,,"Catherine Keener, Matt Craven, Hannah Gross, C...",32.0,https://images-na.ssl-images-amazon.com/images...,Alan Gilsenan,2016-09-11,90 min,Drama,1 nomination.,,"EU 3,973,431",0


Since there's only 1 value containing EU, it's simple enough to just assign the proper value to it. This movie was released in 2016, so we'll have to use the 2016 Euro to USD exchange rate (1.11), found [here](https://www.statista.com/statistics/412794/euro-to-u-s-dollar-annual-average-exchange-rate/).

In [17]:
3973431 * 1.11

4410508.41

In [18]:
budget[budget.str.contains('EU')] = '4410508'
budget.str.contains('EU').sum()

0

Once we try to apply .astype('int'), we encounter another case: 'CAD'.

In [19]:
data[budget.str.contains('CAD')]

Unnamed: 0,imdbid,title,plot,rating,imdb_rating,metacritic,dvd_release,production,actors,imdb_votes,poster,director,release_date,runtime,genre,awards,keywords,Budget,Box Office Gross
5203,tt1092082,Passchendaele,"The lives of a troubled veteran, his nurse gir...",R,6.5,,2009-11-03,Alliance Atlantis,"Paul Gross, Caroline Dhavernas, Joe Dinicol, M...",7246.0,https://images-na.ssl-images-amazon.com/images...,Paul Gross,2008-10-17,114 min,"Drama, History, Romance",11 wins & 5 nominations.,battle|veteran|canadian-armed-forces|canadian-...,"CAD 20,000,000",0
5764,tt1376195,Gunless,A hardened American gunslinger is repeatedly t...,,6.5,,2011-08-08,Cinema Eopch,"Paul Gross, Sienna Guillory, Dustin Milligan, ...",3157.0,https://images-na.ssl-images-amazon.com/images...,William Phillips,2010-04-30,89 min,"Action, Comedy, Drama",5 wins & 5 nominations.,gunslinger|duel|wild-west|bounty-hunter|blacks...,"CAD 10,000,000",0


Once again, it's simple enough to just change these two values by hand since there are only 2. CAD to USD exchange rate found [here](https://fxtop.com/en/historical-currency-converter.php?A=100&C1=USD&C2=USD&DD=01&MM=01&YYYY=2008&B=1&P=&I=1&btnOK=Go%21).

In [29]:
[20000000*0.981523, 10000000*1.050118]

[19630460.0, 10501180.000000002]

In [30]:
budget[5203] = '19630460'
budget[5764] = '10501180'

In [48]:
budget.astype('int').head()

0     18000
1     88300
2    220000
3      6590
4         0
Name: Budget, dtype: int64

There aren't anymore errors when converting to int. Then we've finished parsing budget.

In [51]:
budget = budget.astype('int')
data['Budget'] = budget
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8468 entries, 0 to 8467
Data columns (total 19 columns):
imdbid              8468 non-null object
title               8468 non-null object
plot                8196 non-null object
rating              8468 non-null object
imdb_rating         7735 non-null float64
metacritic          5079 non-null float64
dvd_release         5335 non-null datetime64[ns]
production          6758 non-null object
actors              8153 non-null object
imdb_votes          7735 non-null float64
poster              7967 non-null object
director            8390 non-null object
release_date        8283 non-null datetime64[ns]
runtime             7846 non-null object
genre               8424 non-null object
awards              5242 non-null object
keywords            6381 non-null object
Budget              8468 non-null int64
Box Office Gross    8468 non-null object
dtypes: datetime64[ns](2), float64(3), int64(1), object(13)
memory usage: 1.2+ MB


In [21]:
data.isna().sum()

imdbid                 0
title                  0
plot                 272
rating              3216
imdb_rating          733
metacritic          3389
dvd_release         3133
production          1710
actors               315
imdb_votes           733
poster               501
director              78
release_date         185
runtime              622
genre                 44
awards              3226
keywords            2087
Budget                 0
Box Office Gross       0
dtype: int64

Clearly we are going to have to do something about these missing values. We'll look at what each of these variables is and handle their missing values on a case by case basis.

In [22]:
data['rating'].unique()

array(['UNRATED', 'NOT RATED', nan, 'PG', 'G', 'TV-PG', 'PG-13', 'R',
       'TV-14', 'TV-MA', 'M', 'NC-17', 'APPROVED', 'X', 'NR', 'TV-G',
       'TV-Y', 'Unrated'], dtype=object)

Already I get the impression that the missing values should just be 'UNRATED' or 'NOT RATED'. First, we should figure out the distinction between unrated and not rated movies.

Not rated movies are movies that were not submitted to the MPAA for ratings.

Unrated movies are those that have had scenes altered/omitted/added that may or may not have an effect on a movie's rating. Usually this only happens with DVD releases. Using this knowledge, it makes sense to turn all the missing values into 'NOT RATED' categories. There's no good reason to justify giving them a rating.

Another solution is to find another dataset containing each movie and their respective ratings.

In [23]:
data[data['rating'].isna()].head()

Unnamed: 0,imdbid,title,plot,rating,imdb_rating,metacritic,dvd_release,production,actors,imdb_votes,poster,director,release_date,runtime,genre,awards,keywords,Budget,Box Office Gross
3,tt0074252,"Ugly, Dirty and Bad",Four generations of a family live crowded toge...,,7.9,,2016-11-01,Compagnia Cinematografica Champion,"Nino Manfredi, Maria Luisa Santella, Francesco...",5705.0,https://images-na.ssl-images-amazon.com/images...,Ettore Scola,1976-09-23,115 min,"Comedy, Drama",1 win & 2 nominations.,incest|failed-murder-attempt|poisoned-food|bap...,6590,0
4,tt0084269,Losing Ground,A comedy-drama about a Black American female p...,,6.3,,NaT,Milestone Film & Video,"Billie Allen, Gary Bolling, Clarence Branch Jr...",132.0,https://images-na.ssl-images-amazon.com/images...,Kathleen Collins,1982-06-01,86 min,"Comedy, Drama",,artist|painter|marriage|black-independent-film...,0,0
5,tt0085180,L'argent,A forged 500-franc note is cynically passed fr...,,7.5,95.0,2005-05-24,Criterion Collection,"Christian Patey, Vincent Risterucci, Caroline ...",5607.0,https://images-na.ssl-images-amazon.com/images...,Robert Bresson,1983-05-18,85 min,"Crime, Drama",2 wins & 3 nominations.,note|murder|solitary-confinement|robbery|deliv...,0,0
9,tt0103935,Rebels of the Neon God,"Within the urban gloom of Taipei, four youths ...",,7.6,82.0,2015-10-27,Big World Pictures,"Chao-jung Chen, Chang-Bin Jen, Kang-sheng Lee,...",2155.0,https://images-na.ssl-images-amazon.com/images...,Ming-liang Tsai,1994-08-04,106 min,"Crime, Drama",5 wins & 5 nominations.,taipei|cigarette-smoking|hotel-room|kissing|ph...,28422,0
13,tt0110998,River of Grass,"Cozy, a dissatisfied housewife, meets Lee at a...",,6.5,69.0,2003-03-18,Oscilloscope Laboratories,"Larry Fessenden, Dick Russell, Stan Kaplan, Mi...",582.0,https://images-na.ssl-images-amazon.com/images...,Kelly Reichardt,1995-10-13,76 min,Drama,6 nominations.,f-rated|title-directed-by-female|directorial-d...,8534,0


Looking at the first few movies with missing ratings, let's get a sense of whether they just have no rating at all, or if the dataset is just missing data.

[Ugly, Dirty, and Bad](https://www.rottentomatoes.com/m/ugly_dirty_and_bad), has a rating of 'NR', Not Rated.

[Losing Ground](https://www.rottentomatoes.com/m/losing_ground_1982), has a rating of 'NR', Not Rated.

[L'argent](https://www.rottentomatoes.com/m/largent), has a rating of 'NR', Not Rated.

[Rebels of the Neon God](https://www.rottentomatoes.com/m/rebels_of_the_neon_god), has a rating of 'NR', Not Rated. Strangely enough, this movie is listed as having come out on Apr 10, 2015. Whereas the dataset has August 4, 1994.

[River of Grass](https://www.rottentomatoes.com/m/river_of_grass), has a rating of 'NR', Not Rated.

Then it seems like a reasonable idea to assign each missing rating, 'NOT RATED'.

In [24]:
data['rating'] = data['rating'].fillna('NOT RATED')

In [25]:
data.isna().sum()

imdbid                 0
title                  0
plot                 272
rating                 0
imdb_rating          733
metacritic          3389
dvd_release         3133
production          1710
actors               315
imdb_votes           733
poster               501
director              78
release_date         185
runtime              622
genre                 44
awards              3226
keywords            2087
Budget                 0
Box Office Gross       0
dtype: int64

In [26]:
data['imdb_rating'].unique()

array([ 8.1,  8. ,  7.2,  7.9,  6.3,  7.5,  7.8,  7.3,  7.6,  6.5,  8.8,
        7.7,  6.7,  5.2,  7. ,  6.2,  6.1,  6. ,  5.1,  6.6,  3.3,  6.9,
        5.9,  nan,  6.8,  7.1,  8.3,  7.4,  6.4,  5.7,  5.5,  5.4,  3.8,
        4.2,  4.3,  5.8,  5.6,  3.5,  5. ,  3.4,  8.6,  4.8,  4.9,  8.2,
        4.7,  4.6,  4. ,  4.4,  5.3,  8.5,  4.5,  4.1,  3.1,  8.4,  3.6,
        3.7,  1.6,  2.9,  3.9,  2.6,  3.2,  8.7,  9.2,  2.1,  9.4,  1.8,
        8.9,  2.3,  2.7,  9.5,  1.9,  9. ,  9.7,  9.3,  9.1,  2.8,  2.4,
        2. ,  2.2,  1.1,  3. ,  1.4,  2.5,  1.7,  9.8, 10. ,  9.9])

The first thing I notice is that we have partial ratings (ie. 8.5, 1.3, 2.2, etc.). Also, this is ordinal data, an 8 movie is better than a 7 movie is better than a 6 movie... So we will bin the ratings into the set [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] after we deal with the missing data.

In [27]:
data[data['imdb_rating'].isna()]

Unnamed: 0,imdbid,title,plot,rating,imdb_rating,metacritic,dvd_release,production,actors,imdb_votes,poster,director,release_date,runtime,genre,awards,keywords,Budget,Box Office Gross
36,tt0427340,Masters of the Universe,"He-Man, the most powerful man in the universe,...",NOT RATED,,,NaT,Sony Pictures Entertainment,,,,,2019-12-18,,"Action, Adventure, Drama",,sword|sword-and-sorcery|sword-and-fantasy|spin...,0,0
42,tt0437086,Alita: Battle Angel,"In the twenty-sixth century, a female cyborg i...",NOT RATED,,,NaT,ADV Films,"Michelle Rodriguez, Jennifer Connelly, Mahersh...",,https://images-na.ssl-images-amazon.com/images...,Robert Rodriguez,2018-07-20,,"Action, Adventure, Romance",,based-on-manga|cyborg|female-cyborg|26th-centu...,200000000,0
46,tt0448115,Shazam!,A boy is given the ability to become an adult ...,NOT RATED,,,NaT,Warner Bros.,Dwayne Johnson,,,David F. Sandberg,2019-04-05,,"Action, Fantasy, Sci-Fi",,shazam|wizard|superhero|magic|dc-extended-univ...,0,0
55,tt0460890,The Only Living Boy in New York,"Adrift in New York City, a recent college grad...",NOT RATED,,,NaT,Amazon Studios & Roadside Attractions,"Kate Beckinsale, Pierce Brosnan, Jeff Bridges,...",,https://images-na.ssl-images-amazon.com/images...,Marc Webb,2017-08-11,,Drama,,title-based-on-song,163145,0
75,tt0491175,Suburbicon,A home invasion rattles a quiet family town.,R,,,NaT,,"Matt Damon, Julianne Moore, Oscar Isaac, Megan...",,,George Clooney,2017-11-03,,"Comedy, Crime, Mystery",,one-word-title|dark-comedy,0,0
76,tt0491203,Tulip Fever,An artist falls for a young married woman whil...,R,,,NaT,The Weinstein Company,"Cara Delevingne, Alicia Vikander, Dane DeHaan,...",,https://images-na.ssl-images-amazon.com/images...,Justin Chadwick,2017-08-25,107 min,"Drama, Romance",,17th-century|portrait|artist|amsterdam|tulips|...,25000000,0
151,tt0974015,Justice League,Fueled by his restored faith in humanity and i...,NOT RATED,,,NaT,Warner Bros. Pictures,"Gal Gadot, Robin Wright, Connie Nielsen, Jason...",,https://images-na.ssl-images-amazon.com/images...,Zack Snyder,2017-11-17,,"Action, Adventure, Fantasy",,justice-league|superhero-teamup|comicbook-movi...,0,0
191,tt1072748,Winchester Mystery House,,NOT RATED,,,NaT,,Helen Mirren,,,"Michael Spierig, Peter Spierig",NaT,,Thriller,,,0,0
217,tt1131724,2:22,A man's life is derailed when an ominous patte...,PG-13,,,2017-06-30,,"Teresa Palmer, Michiel Huisman, Sam Reid, Maev...",,https://images-na.ssl-images-amazon.com/images...,Paul Currie,2017-06-30,,Thriller,,time-for-title|number-in-title,0,0
221,tt1137450,Death Wish,A mild-mannered father is transformed into a k...,R,,,NaT,,"Jack Kesy, Bruce Willis, Vincent D'Onofrio, El...",,,Eli Roth,NaT,,"Action, Crime, Drama",,remake|based-on-novel|vigilante,0,0


Notice that a fair number of movies that are missing imdb ratings are also missing metacritic ratings, dvd releases, posters, and box office returns. On top of all of this, many also have release dates that are after 9/21/2017. There isn't a reason to try to impute these values since they're essentially missing all relevant information because they haven't been released yet.