# Movie Curiosities: An IMDB Exploration (code-free)

## Introduction

In this project, I'm going to explore a movie dataset to attempt to answer few, broad questions regarding movie budgets, revenues, scores, etc. I built this dataset combining together three separate ones; if you're interested, you can see how I did it in the notebook [imdb_wrangling.ipynb](https://github.com/NicolaBagala/portfolio/blob/master/imdb/imdb_wrangling.ipynb) found in the same repository as this one. 

Some of the charts in this notebook are "convenience" charts made on-the-fly to spot interesting patterns, as opposed to more sophisticated charts intended for presentations.

(**Note:** this is a code-free version of the project. If you want to look at my code, see [imdb_exploration.ipynb](https://github.com/NicolaBagala/portfolio/blob/master/imdb/imdb_exploration.ipynb) in this same repository.)

## The dataset

Let's quickly familiarise ourselves with the dataset. It contains 20 columns, briefly summarised below:

- `ID` is a unique alphanumeric string automatically assigned by IMDB to each movie.

- `TITLE` is the original title of a movie.
- `RELEASE_YEAR` is the year when a movie was [first released](https://help.imdb.com/article/contribution/titles/release-dates/GVUUDEPJNAW6G35P#) to the public.
- `GENRE` is a string specifying the genres that apply to a movie.
- `LENGTH_MIN` is the length of a movie, in minutes.
- `COUNTRY` is a string specifying the origin country (or countries) of a movie.
- `BUDGET_USD` and `GLOBAL_GROSS_USD` are the budget and worldwide gross of a movie, expressed in inflation-adjusted 2021 United State Dollars.
- `WAVG_SCORE` is a weighted average of the scores that IMDB users have given to the movie. This value is calculated by IMDB according to a formula that [they chose not to disclose](https://help.imdb.com/article/imdb/track-movies-tv/weighted-average-ratings/GWT2DSBYVT2F25SK?ref_=helpsect_pro_2_8#). In the text, I sometimes refer to this score as the 'IMDB score'.
- `VOTES` is the number of votes that a movie has received from IMDB users. (Users can vote only once for each given movie, so `VOTES` is equivalent to the number of IMDB users who cast a vote on a given movie.)
- `M_AVG_SCORE` and `F_AVG_SCORE` contain the average score given to each movie by male and female IMDB users, respectively.
- `M_017_AVG_SCORE` to `M_45PLUS_AVG_SCORE` break down a movie's average score by age brackets for male voters (`45PLUS` means age 45 and above). For example, if `M_017_AVG_SCORE` was 7.3 for *The Prestige*, it would mean that the average score that IMDB male users between the ages of 0 and 17 gave to that movie is 7.3. Similarly, columns `F_017_AVG_SCORE` to `F_45PLUS_AVG_SCORE` refer to female users only.

## Exploring the dataset

Broadly speaking, we'll deal with three kinds of questions, plus any further questions that may originate from them as we go:

- questions regarding movie budgets and revenues,
- questions about movie length, and
- questions about movie scores.

### Of budgets and revenues

In this section, we're going to try to answer three main questions:

1. **How did movie budgets change over time?**
2. **How did movie revenues change over time?**
3. **Is there a correlation between movie budgets and movie revenues?**

(You'll notice I'm using the word "revenue" to refer to the global gross of movies, indicated in the dataset column `GLOBAL_GROSS_USD`. That's because a) "revenue" is shorter than "global gross", and b) the word "gross" is, well, gross.) 

Of course, changes in budgets and revenues can depend on many things—for example, it's possible that budgets would go up in some countries, but down in others. What we're aiming for is a broad overview, so in general, we're going to settle for worldwide averages. (Keep in mind that several budget and revenue values were intentionally eliminated from the dataset during the wrangling phase, because they were unsuitable for various reasons.)

#### Budgets go up...

We don't have budget information for all movies in the dataset: only short of 14000 movies come with that information, and the number of movies with known budget varies pretty much each year.

![](figures/movies_with_budget.png)

_`Figure 1. Movies with budget vs movies in the dataset, by year.`_

This means that, for some years, we'll likely have a less accurate average budget, because we have far fewer budgets to calculate it from. With that caveat in mind, let's plot the average budget for each given year.

![](figures/avg_budget_by_year.png)

_`Figure 2. Average movie budget by year.`_

At a glance, it looks like **movie budgets have been growing somewhat steadily, though with a lot of ups and downs.** This might be for a number of reasons—for example, fewer (or more) movies might have been produced in a given year, or maybe they were less (or more) expensive compared to movies from other years.

#### ...but revenues go down

Next, we can look into how movie revenues have been changing over the years. As before, we don't know the revenue for every movie in the dataset. Before the 60s, there's basically no revenue information to speak of.

![](figures/movies_with_revenue.png)

_`Figure 3. Movies with revenue vs movies in the dataset, by year.`_

If we plot the average worldwide movie revenue by year, you can immediately tell that something's off.

![](figures/revenue_by_year.png)

_`Figure 4. Average movie revenue by year.`_

There are _huge_ spikes around the 1940s, and while there may well be ridiculously high-grossing outliers from that time ([_Gone with the Wind_](https://en.wikipedia.org/wiki/Gone_with_the_Wind_(film)) being one), this is likely an artefact due to the lack of revenue data for that time period. Let's have a look at what's causing the spikes; then we'll try to restrict the chart to a more sensible time frame. The table below shows the 10 top-grossing movies from 1939 to 1942.

![](figures/revenue_outliers.png)

_`Table 1. Ten top-grossing movies from 1937 to 1942.`_

The ten movies above have made a lot of money—sometimes in the order of _billions!_ However, only a handful of movies from the same period come with revenue data, so this is what's causing the absurdly huge spikes. From the chart in `Figure 3`, it is difficult to tell exactly how many movies come with revenue data each year, so let's choose a threshold manually. Around the time of the spikes, there are only about 8 known revenues per year; that is very little, so, let's say instead that we want at least 25 known revenues in a year to bother calculating the average revenue for that year, and let's plot that subset of the data.

![](figures/avg_median_revenue_by_year_restricted.png)

_`Figure 5. Average and median movie revenue by year. (Only years with 25 or more movies.)`_

Naturally, the same trend we saw in `Figure 4` is still there—it seems that **movie revenues are slowly going down on average.** This is not related to how many data points we have: especially after the year 2000, there's plenty of revenue information for all years, except for 2020 (which is around the time when the original dataset was created). Even the median revenue per year (in red) shows a decline spanning from the late 70s to the year 2000. However, before and after this timeframe, the median is essentially flat; this is true also for the median across _all_ available revenues.

![](figures/median_revenue_by_year.png)

_`Figure 6. Median movie revenue by year. (All years in the dataset.)`_

Barring the exceptional spikes from the 40s and before, it seems that the median movie revenue has been virtually the same, around 1.000.000 2021-USD, _except_ for a sudden peak in the late 70s that took over 20 years to fully disappear. In the chart below, we're showing revenues from three time frames: 1945-1975, 1976-2000, and 2000-2020. The maximum revenue, represented by the end of each top whisker, is very similar across the three time frames, and the three average revenues—in blue—aren't that far apart from each other either. However, the three median revenues (in red) all have different orders of magnitude: the 1945-1975 median is around \\$460.000; the 1976-2000 median is about \\$11 million, and the 2001-2020 median is around \\$1 million. This means, for example, that 50\% of the movies from 1976-2000 grossed at least $11 million. Below the upper edge of each golden box lie seventy-five percent of the revenues, so **50\% of the movies from 1976-2000 had a higher revenue than 75\% of the movies from both the other two time frames!**

![](figures/revenue_box_plot.png)

_`Figure 7. Mean, median and quartiles of movie revenues from 1945-1975, 1976-2000, and 2000-2020.`_

In short, this suggests that **movies between 1976 and 2000 generally grossed more than movies from the other two time periods**, although the rather different number of revenues available might be affecting this result. Movies outside the 1976-2000 peak _did_ gross less on average, but the peak may well be just an exception. What might have caused it is an interesting question, but it's beyond the scope of this project.

#### Spend more, earn more?

It's fair to wonder whether more expensive movies tend to make more money than less expensive ones. To try to answer that question, we can look for a correlation between movie budgets and movie revenues. There are 7148 movies with both budget and revenue information in the dataset, and the Pearson correlation coefficient between these two values is just below 0.6. That is a moderate positive correlation, suggesting that the two values may indeed go up or down together in a linear fashion. However, a quick scatter plot shows that most movies cluster to the bottom left of the chart.

![](figures/budget_vs_revenue_scatter.png)

_`Figure 8. Movie revenue vs movie budget.`_

While we can see a slight upward trend as the budget grows, the jungle of points in the lower-left quadrant makes it difficult to spot any pattern. We can try to "zoom in" on movies with a budget below 200.000.000 USD, and take a smaller random sample to avoid an overly crowded chart.

![](figures/budget_vs_revenue_scatter_zoomed.png)

_`Figure 9. Movie revenue vs budget. (Half of budgets below $200.000.000.)`_

There's a bit of an uptick past about 100.000.000 USD, but the situation is still too crowded to make sense of it with a scatter plot. Another option is to subdivide our budget range into convenient intervals, group together budgets that fall in the same interval, and calculate their average revenue.

![](figures/revenue_by_budget_interval.png)

_`Figure 10. Average revenue by budget range.`_

The chart leaves no doubt that, on average, the higher the budget, the higher the revenue, especially for very high budgets. This applies only to the subset of movies with budget and revenue information we have available, though, and doesn't take into account the year the movie was released. (Many different factors may influence movie revenues, and it may well be that, for a particular year, movie revenues were low on their own, no matter the budget.)

### Does size matter?

Back in the 90s, about 90-100 minutes were pretty typical lengths for movies. However, one gets the impression that movies have been getting longer in more recent years, so it's fair to wonder whether movie length has been increasing in a consistent fashion over time. In this section, we're going to check that; additionally, we're going to see if longer movies are more expensive (which is reasonable to expect) and whether there's any correlation between the length of a movie and how well it scores.

![](figures/avg_length_by_year.png)

_`Figure 11. Average movie length by year.`_

The question of length was easy to settle—quite clearly, **the average movie length has been going up over the years.** The more jagged line up until about the 1930s is likely due to the low number of movies we have for that time frame, although it would be interesting to see what are the outliers of the early 1920s that pushed the average movie length all the way up to present-day levels.

![](figures/length_outliers.png)

_`Figure 12. Average movie length between 1920 and 1930. The highlights show extreme outliers.`_

From the line chart above, two years stand out in particular: 1923 and 1924. Compared to other years of the decade, they have the highest percentages of movies whose length was above the average of the time, which was about 87 minutes.

![](figures/movies_above_avg_length.png)

_`Figure 13. Percentage of movies of length above each year's average, 1920-1930.`_

Now that we know where to look, we can find the five longest movies from those two years that were above average length.

![](figures/top-5_longest.png)

_`Figure 14. Length of top-5 longest movies from 1923 and 1924.`_

With a bit of a stretch, you could say that, in 1924, the top-5 longest movies were fairly comparable both in length and in how much longer than average they were; **in 1923, the longest movie was about three times longer than the other four in the top-5!**

Next, we can have a look at any possible correlation between movie length and movie budget. Intuitively, we expect a strong positive correlation: the longer the movie, the higher its cost and thus its budget should be; instead, with a Pearson coefficient barely above 0.4, the correlation between the two is weak-ish—certainly weaker than I was expecting. If we look at a scatter plot over all movies, we see that up to some point between 150 and 200 minutes, budgets _do_ tend to increase with movie length; however, past that point, budget size decreases again, being relatively low for some length outliers. (Movies with no budget information are not shown.)

![](figures/length_vs_budget_scatter.png)

_`Figure 15. Movie budget vs movie length.`_

It makes sense to see whether budgets increase _on average_ with movie length. Let's categorise each movie based on its duration as follows:

- `Up to 1 hr`
- `1-2 hr`
- `2-3 hr`
- `3-4 hr`
- `Over 4 hr`

Unfortunately, this categorisation will produce rather different category sizes, but it's better than the alternative. Categories with comparable numbers of movies tend to include movies with length ranges that are either too short (such as 86-93 minutes) or too long (112-808 minutes!).

![](figures/length_vs_budget_line.png)


_`Figure 16. Average movie budget by length category.`_

The chart above seems to confirm what the scatterplot said: up to around 4 hours, the longer the movie, the higher the budget; however, the steepness of the line decreases with movie length, which means that **budget increases become more and more modest as the length of the movie goes up.** After the 4-hour mark, budgets drop.

Finally, we can see if the length of a movie and how high it scores are correlated in any way. Intuitively, we should expect they don't; and indeed, the Pearson coefficient is just above 0.2, so there's basically no (linear) relationship between the two variables, as shown by the scatterplot below: **within the 200-minute mark, virtually all possible scores are attained many, many times.** However, after the 400-minute mark, movies do tend to score fairly high. Still, these are exceptions, as movies that long are extremely rare, and are probably works of art whose high score is unlikely to depend on their length.

![](figures/length_vs_score_scatter.png)

_`Figure 18. Average movie score vs. movie length.`_

### IMDB scores inside out

Let's start with an overview of the movie scores on IMDB. What does the `WAVG_SCORE` distribution look like?

![](figures/score_distribution.png)

_`Figure 19. Distribution of movies in the dataset by their average IMDB score.`_

The distribution is fairly left-skewed. On average, the `WAVG_SCORE` of a movie on IMDB is 5.9; the median is barely above that at 6.1, while the most frequent `WAVG_SCORE` is 6.4. This means around 25,000 movies on IMDB are considered to be about average, in that their score gravitates around the average score across the entire dataset. Slightly more movies could be defined as "good", that is with an average score between about 6.5 and 7. Very good movies are more rare: the number of movies drops dramatically when the score is just above 7, and truly outstanding movies, scoring above 8, are extremely rare.

Another interesting, generic question we can ask ourselves is: **do older movies score worse or better than modern movies?** That's easy to see.

![](figures/avg_score_by_year.png)

_`Figure 20. Average movie score by year.`_

On average, movie scores seem to be plummeting with time. **Let's keep in mind, however, that this does not reflect how "picky" viewers from different times may be:** IMDB was not around [before 1990](https://www.britannica.com/topic/IMDb), so what we're seeing here is how modern viewers rate movies from different years, and it appears that, on average, they rate older movies more highly. This may be for a variety of reasons: it might be that there are only few titles for older years and they all happen to be rated fairly highly; it might be that only movie _connaisseurs_ watch and rate older movies, and they may be more likely to rate them highly than the average modern viewer; or, it might be that older movies were actually better according to the average viewer.

#### Scores and budgets

Moving on to more specific questions, we can try to see whether throwing more money at a movie increases its chances of being successful. In more formal terms: **do higher budgets correlate with higher scores?** We can start by looking at the Pearson correlation coefficient: about 0.28. Any correlation, if at all present, is very weak. The scatter plot below, however, seems to suggest that rather than a _linear_ correlation, measured by the Pearson coefficient, the two variables might have a _logarithmic_ correlation: `WAVG_SCORE` does increase with `BUDGET_USD`, but the increase becomes less and less important with larger budgets. (At a point, this must happen anyway, because the `WAVG_SCORE` can't be more than 10.)

![](figures/budget_vs_score_scatter.png)

_`Figure 21. Average score vs budget.`_

At a glance, most movies have budgets up to \\$100.000.000, yet their score range is rather dense and wide, spanning from the minimum all the way to the maximum. Most of the data points representing movies with budget up to a hundred million dollars cluster around score 6, suggesting that decent—and even very good—movies can be produced even with a relatively modest budget.

Movies with a budget larger than \\$100.000.000 are more rare, but they're also less likely to score below 5. Movies with extremely high budgets don't really break through the ceiling of score 8 to any significant extent; movies with more modest budgets do that much more often, but of course, lower-budget movies are vastly more common, so, statistically speaking, this is to be expected. 

As we did before for different variables, we can group movies into budget groups, calculate the average score for each group, and see how this average varies.

![](figures/budget_vs_score_line.png)

_`Figure 22. Average score by budget group.`_

As the scatter plot in `Figure 21` suggested, the average IMDB score does grow with budget size, but the growth is indeed logarithmic-looking: faster at first, slower later on. This suggests that, **after a certain point, more money won't necessarily make the movie better**; not everything about movie quality depends on how much you can spend on it.

#### Scores and revenue

The next question we can tackle is whether a higher score correlates with a higher revenue, which seems reasonable to expect: the higher a movie is rated, the more the people that will likely be willing to pay to see it. As usual, let's first get a sense for it by checking the Pearson coefficient and a scatter plot. The coefficient is about 0.15, suggesting no linear correlation, though the two variables might be correlated in some other way.

![](figures/score_vs_revenue_scatter.png)

_`Figure 23. Movie revenue vs. average score.`_

There is certainly no shortage of movies that didn't gross a lot despite scoring very high. However, it does look like that revenue increases with score—slower at first, then faster and faster until about score 7.5, when the revenue slightly decreases again. By grouping movies by scores, we can see if this trends holds true on average.

![](figures/score_vs_revenue_line.png)

_`Figure 24. Average revenue by score group.`_

In general, the trend seems confirmed; however, we need to keep in mind that among movies that scored 7 or higher, there are a few outliers that happen to have high-to-very-high revenues, and if we excluded them, the curve wouldn't rise quite so steeply.

#### Average movie score by country

It would be interesting to find out what's the average rating for movies of each given country; we can calculate this as the average `WAVG_SCORE` (so, an average of averages) of all movies that come from each country. In some cases, a movie was produced by more than one country, which means that such a movie will be counted in multiple averages. For example, if a movie was a joint collaboration between the US and the UK, its score would be factored in in the average for both countries. Note that the country of origin of a movie isn't always known; we'll ignore these movies. To get an idea of the reliability of this average, we'll also keep track of how many movies there are for each country and the average number of votes per movie, calculated as the total number of votes for all movies of a given country divided by the total number of movies from that country. The results for the ten top-scoring countries can be seen in the table below.

![](figures/score_by_country_top_10.png)

_`Table 2. Top-10 countries by average score of their movies.`_

Most of the 10 top-scoring countries in the table above aren't exactly the first that come to mind when you think of famous or very good movies. This doesn't mean that they *can't* produce good movies, but for most of them, we only know about a handful of movies they've produced, which is not enough to tell how good movies from those countries may be on average. We need to choose a cut-off point, a minimum number of movies that a country must have for us to consider it. Up to 50% of all countries in the dataset have far too few movies—21 at most. This is admittedly rather arbitrary, but let's choose as the cut-off point the 75% mark, that is only countries with more than 224 movies on record.

![](figures/score_by_country_top_10_filtered.png)

_`Table 3. Top-10 countries by average score of their movies. (Only countries with 224 or more movies.)`_

The list we got is a bit "better": there are much many more movies per country, and the average number of votes per movie is decent too. Still, intuitively one would probably expect to see the USA somewhere among the top-10, yet it's not there. Where is it?

![](figures/score_by_country_bottom_5.png)

_`Table 4. Bottom-5 countries by average score of their movies. (Only countries with 224 or more movies.)`_

Nearly at the bottom, with a rather mediocre score. We can compare the distribution of IMDB average scores for the USA against that of the world in the chart below.

![](figures/usa_vs_world_score.png)

_`Figure 25. Proportion of movies across the average score range, USA vs. world.`_

Compared to the entire world, it seems the USA has a higher percentage of "bad" scores, i.e. below 6, and a lower percentage of scores above 6. Compare this with movies from the Soviet Union, which is at the top of our list:

![](figures/urss_vs_world_score.png)

_`Figure 26. Proportion of movies across the average score range, URSS vs. world.`_

The distribution is _dramatically_ different from that of the entire world, with over 70% of Soviet Union movies scoring on average around 7, and nearly 20% around 8. It may well be that Soviet Union movies are just that good, though the stark difference from the world distribution is a little suspect. A possible explanation may lie in _who_ watches these movies. For example, if they were only available in Russian without subtitles, it's possible that they're watched mostly by Russian people, who might rate them more highly than other people would because they can relate to them more.

If this were true, the fact that movies from _Russia_, instead, are at the bottom of the list just above the USA might seem a bit of a puzzle; it could be explained by the fact that Russia (as in, the modern Russian Federation) came to be in 1991, after the Soviet Union collapsed, so Russian movies are more recent and perhaps more likely to have reached a wider audience. It's also possible that country data might be mixed up and therefore causing errors: for example, the dataset contains movies from the Soviet Union released _after_ its collapse, and movies from Russia from well before it was created: the newest Soviet Union movie in the dataset is from 1995, and the oldest Russian movie in the dataset is from 1911.

In any case, checking whether there is any "country bias" in the votes is not possible, because the dataset doesn't contain very granular location data. (The original dataset only categorised voters as being either located in the US or not.)

#### Influence of voter sex and movie genre on scores

It's natural to wonder which movie genres rank the highest on IMDB. To figure it out, we can use the same kind of list we used to check average scores by country.

![](figures/score_by_genre.png)

_`Table 5. Movie genre by average score.`_

Just like before, some scores don't necessarily describe the overall quality of their respective genre, because they were calculated basing on too few movies. Only 25% of all genre scores are based on less than 1583 movies, so let's restrict ourselves to genres with at least that many movies on record.

![](figures/score_by_genre_filtered.png)

_`Table 6. Movie genre by average score. (Only genres with at least 1583 movies.)`_

None of the genres appear to be outstanding. The top-scoring half scores about average, and from there it's pretty much all downhill. However, it should be noted that these scores aren't representative of "pure" genres, because many of the movies in the dataset were tagged with several genres, so that, for example, the score of the same individual movie may have influenced both the overall `Thriller` and `Sci-Fi` scores. Regardless, according to the IMDB userbase, it seems the best three movie genres out there are biography, history, and war.

This per se isn't particularly interesting, but any difference in scoring between male and female voters might be. **Does the same list look any different in terms of score, if we break it down by sex?**

![](figures/score_by_sex.png)

_`Table 7. Movie genre by average score and average score by sex. (Only genres with at least 1583 movies.)`_

The columns `M_AVG_SCORE` and `F_AVG_SCORE` show the average score across all movies of each genre by voter sex; the values in the `M_F_DIFF` column have been calculated as `M_AVG_SCORE` minus `F_AVG_SCORE`. These differences are all fairly small, each being less than half a point, but they're all negative, which means that **on average, male voters scored each and every movie genre lower than female voters did.** This is true also of the genres we excluded because of their low movie count. With the data we have, it's difficult to even speculate as to why female voters seem to be ever so slightly more generous than males, but the fact they're _consistently_ so across all movie genres is interesting.

The histograms below show the proportion of movies whose average score was 1, 2, etc, by sex. **The proportion of average "bad" scores (1-5) from female voters is nearly always lower than for males; vice-versa, the proportion of average "good" scores (6-10) from females is higher.** For example, male users gave an average score between 7 and 8 to about 8% of the movies they rated; female users gave the same average score to 15% of the movies they rated!

![](figures/female_vs_male_score.png)

_`Figure 27. Proportion of movies by average score assigned by male (blue) and female (purple) voters.`_

The above is a broad overview that doesn't take into account voter age. It would be interesting to see if this phenomenon changes in any way when age cohorts are considered. Doing so across all movie genres is beyond the scope of this project, but to conclude this exploration, we can calculate how the average male and female scores change overall with age.

![](figures/female_vs_male_score_by_age.png)

_`Figure 28. Average IMDB score by age cohort and sex.`_

According to the chart above, **the average score given by a cohort of people of either sex to any movie, regardless of its genre or anything else, decreses as the cohort's age increases.** For example, we see that, on average, female IMDB voters between the ages of 18 and 29 rate movies just above 6.3; the same rating drops to less than 6.1 for female voters aged 30 to 44, and it drops even further for the next cohort.

This chart too suggests that females are slightly more generous when scoring movies than males are, and on top of that, males become more critical than females do with age. For some reason, the females younger than 17 appear a bit more critical than males of the same age range, but the difference is minimal and doesn't negate the overall trends observed.

## Conclusions

During this quick exploration of a dataset of nearly 90000 movies, we tried to answer a few questions about movie budgets, revenues, length, and scores. We found out that:

- on average, **budgets have slowly but fairly steadily grown** from the early 1900s to 2020;

- if a few outliers from the 1940s are excluded, on average **revenues grew slowly from the early 1900s to about 1980,** when they began to slowly taper off. However, when looking at the median revenue, we saw that it was a fairly flat line throughout the entire century, barring the aforementioned outliers and **a prominent peak around the 1980s that took about 20 years to disappear**. We saw that, indeed, movies from the 1976-2000 time frame often grossed much more than movies in 1945-1975 and 2001-2020;

- on average, **movies with higher budgets gross more**, which seems to be especially true in the case of very large budgets;

- **the average movie has become longer and longer over the past century**, from about 55 minutes in 1900 to over an hour and half in 2020. **Longer movies are more expensive on average**, with the average budget growing up to almost 100 million USD for movies in the 3 to 4 hours range. Movies longer than four hours are rather rare, but their budget is closer to 20 million USD on average;

- **IMDB users rate older movies more highly than newer ones on average**. Movies from the 1930s score around 6.8 on average, but from there, the average rating plummets all the way down to 5.6 for movies from 2020;

- **Higher budgets show a very weak correlation with higher scores.** In particular, between 10 and 100 million USD, budgets don't seem to make much of a difference in terms of score, which gravitates around 6.3. With higher budgets, the average score reaches up to 7 points;

- **The revenue range for movies scoring up to 7 points is on average fairly narrow,** up to 50.000.000 USD. From that point on, there's a rather steep growth that takes the average revenue all the way up to over 300 million USD for movies scoring around 9. We shouldn't forget, however, that the presence of few high-scoring, very high-grossing outliers skews this average upward;

- **Very few western countries are among the top-10 in terms of movie scores.** Somewhat surprisingly, Soviet Union movies have an average score of about 7.1 and are at the top of the list, while the United States are almost at the very bottom of the list of all countries, with a meagre average score of 5.6. It might be that, in the cases of some countries, the location of the voters influences the scores in a way or another (for example, if most voters of a certain movie all come from the same country that produced the movie), but this is not something we could verify because this kind of information was simply not available in the dataset;

- **On average, female voters score _every_ movie genre higher than male voters.** The difference is small but present for all genres, which suggests that male voters might be more "picky," and that they grow more so with age. On average, a movie scored by young males up to 17 years old scores 6.2, against 5.6 in the case of males aged 45 and older. Female voters too become "pickier" with age, but much less so: across all age cohorts, the difference between the maximum and minimum average score from female voters is about a mere 0.3.