## Operating on the entire DataFrame

- In the *Calling Series methods* recipe in `Chapter 1`, a variety of methods operated on a single column or Series of data
- When these same methods are called from a DataFrame, they perform that operation for each column at once

In [2]:
import pandas as pd
import numpy as np
pd.options.display.max_columns = 40

- Read in the movie dataset
- Grab the basic descriptive attributes, `shape`, `size`, and `ndim`, along with running the `len` function

In [4]:
movie = pd.read_csv('data/movie.csv')
movie.shape

(4916, 28)

In [5]:
movie.size

137648

In [6]:
movie.ndim

2

In [7]:
len(movie)

4916

- Use the `count` method to find the number of non-missing values for each column
- The output is a Series that now has the old column names as its index

In [8]:
movie.count()

color                        4897
director_name                4814
num_critic_for_reviews       4867
duration                     4901
director_facebook_likes      4814
actor_3_facebook_likes       4893
actor_2_name                 4903
actor_1_facebook_likes       4909
gross                        4054
genres                       4916
actor_1_name                 4909
movie_title                  4916
num_voted_users              4916
cast_total_facebook_likes    4916
actor_3_name                 4893
facenumber_in_poster         4903
plot_keywords                4764
movie_imdb_link              4916
num_user_for_reviews         4895
language                     4904
country                      4911
content_rating               4616
budget                       4432
title_year                   4810
actor_2_facebook_likes       4903
imdb_score                   4916
aspect_ratio                 4590
movie_facebook_likes         4916
dtype: int64

- The other methods that compute summary statistics such as `min`, `max`, `mean`, `median`, and `std` all return similar Series, with column names in the index and their computational result as the values

In [9]:
movie.mean()

num_critic_for_reviews       1.379889e+02
duration                     1.070908e+02
director_facebook_likes      6.910145e+02
actor_3_facebook_likes       6.312763e+02
actor_1_facebook_likes       6.494488e+03
gross                        4.764451e+07
num_voted_users              8.264492e+04
cast_total_facebook_likes    9.579816e+03
facenumber_in_poster         1.377320e+00
num_user_for_reviews         2.676688e+02
budget                       3.654749e+07
title_year                   2.002448e+03
actor_2_facebook_likes       1.621924e+03
imdb_score                   6.437429e+00
aspect_ratio                 2.222349e+00
movie_facebook_likes         7.348294e+03
dtype: float64

- The `describe` method is very powerful and calculates all the descriptive statistics and quartiles in the preceding steps all at once

In [10]:
movie.describe()

Unnamed: 0,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_1_facebook_likes,gross,num_voted_users,cast_total_facebook_likes,facenumber_in_poster,num_user_for_reviews,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
count,4867.0,4901.0,4814.0,4893.0,4909.0,4054.0,4916.0,4916.0,4903.0,4895.0,4432.0,4810.0,4903.0,4916.0,4590.0,4916.0
mean,137.988905,107.090798,691.014541,631.276313,6494.488491,47644510.0,82644.92,9579.815907,1.37732,267.668846,36547490.0,2002.447609,1621.923516,6.437429,2.222349,7348.294142
std,120.239379,25.286015,2832.954125,1625.874802,15106.986884,67372550.0,138322.2,18164.31699,2.023826,372.934839,100242700.0,12.453977,4011.299523,1.127802,1.40294,19206.016458
min,1.0,7.0,0.0,0.0,0.0,162.0,5.0,0.0,0.0,1.0,218.0,1916.0,0.0,1.6,1.18,0.0
25%,49.0,93.0,7.0,132.0,607.0,5019656.0,8361.75,1394.75,0.0,64.0,6000000.0,1999.0,277.0,5.8,1.85,0.0
50%,108.0,103.0,48.0,366.0,982.0,25043960.0,33132.5,3049.0,1.0,153.0,19850000.0,2005.0,593.0,6.6,2.35,159.0
75%,191.0,118.0,189.75,633.0,11000.0,61108410.0,93772.75,13616.75,2.0,320.5,43000000.0,2011.0,912.0,7.2,2.35,2000.0
max,813.0,511.0,23000.0,23000.0,640000.0,760505800.0,1689764.0,656730.0,43.0,5060.0,4200000000.0,2016.0,137000.0,9.5,16.0,349000.0


- It is possible to specify exact quantiles in the `describe` method using the `percentiles` parameter

In [11]:
movie.describe(percentiles=[.01, .3, .99])

Unnamed: 0,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_1_facebook_likes,gross,num_voted_users,cast_total_facebook_likes,facenumber_in_poster,num_user_for_reviews,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
count,4867.0,4901.0,4814.0,4893.0,4909.0,4054.0,4916.0,4916.0,4903.0,4895.0,4432.0,4810.0,4903.0,4916.0,4590.0,4916.0
mean,137.988905,107.090798,691.014541,631.276313,6494.488491,47644510.0,82644.92,9579.815907,1.37732,267.668846,36547490.0,2002.447609,1621.923516,6.437429,2.222349,7348.294142
std,120.239379,25.286015,2832.954125,1625.874802,15106.986884,67372550.0,138322.2,18164.31699,2.023826,372.934839,100242700.0,12.453977,4011.299523,1.127802,1.40294,19206.016458
min,1.0,7.0,0.0,0.0,0.0,162.0,5.0,0.0,0.0,1.0,218.0,1916.0,0.0,1.6,1.18,0.0
1%,2.0,43.0,0.0,0.0,6.08,8474.8,53.0,6.0,0.0,1.94,60000.0,1951.0,0.0,3.1,1.33,0.0
30%,60.0,95.0,11.0,176.0,694.0,7914069.0,11864.5,1684.5,0.0,80.0,8000000.0,2000.0,345.0,6.0,1.85,0.0
50%,108.0,103.0,48.0,366.0,982.0,25043960.0,33132.5,3049.0,1.0,153.0,19850000.0,2005.0,593.0,6.6,2.35,159.0
99%,546.68,189.0,16000.0,11000.0,44920.0,326412800.0,681584.6,62413.9,8.0,1999.24,200000000.0,2016.0,17000.0,8.5,4.0,93850.0
max,813.0,511.0,23000.0,23000.0,640000.0,760505800.0,1689764.0,656730.0,43.0,5060.0,4200000000.0,2016.0,137000.0,9.5,16.0,349000.0


## There's more...

- To see how the `skipna` parameter affects the outcome, we can set its value to `False` and rerun step 3 from the preceding recipe
- Only numeric columns without missing values will calculate a result

In [12]:
movie.min(skipna=False)

num_critic_for_reviews       NaN
duration                     NaN
director_facebook_likes      NaN
actor_3_facebook_likes       NaN
actor_1_facebook_likes       NaN
gross                        NaN
num_voted_users              5.0
cast_total_facebook_likes    0.0
facenumber_in_poster         NaN
num_user_for_reviews         NaN
budget                       NaN
title_year                   NaN
actor_2_facebook_likes       NaN
imdb_score                   1.6
aspect_ratio                 NaN
movie_facebook_likes         0.0
dtype: float64