# data.world
## Netflix Shows

> #### Setup  

> Before running data.world notebooks for the first time, you'll need to:  
1. Install data.world's Python package, including optional `pandas` dependencies: 
```shell
pip install git+git://github.com/datadotworld/data.world-py.git#egg=project[pandas]
```
1. Obtain an API access token at https://data.world/settings/advanced
1. Store API access token using the `dw` command-line tool: 
```shell
dw configure
```

> Once your environment is set up, these steps do not need to be repeated for other data.world notebooks.

In [31]:
import datadotworld as dw
import pandas as pd
import numpy as np
import tensorflow as tf

In [2]:
# Datasets are referenced by their path
dataset_key = 'chasewillden/netflix-shows'

# Or simply by their URL
dataset_key = 'https://data.world/chasewillden/netflix-shows'

In [5]:
# Load dataset (onto the local file system)
dataset_local = dw.load_dataset(dataset_key)  # cached under ~/.dw/cache


LocalDataset('C:\\Users\\AndrewTran/.dw/cache\\chasewillden\\netflix-shows\\latest\\datapackage.json')


# Next steps

- Run `help()` to learn more ways to access and use your data. Try:
  - `help(dw.load_dataset)`
  - `help(dw.query)`
- Learn more at: https://github.com/datadotworld/data.world-py and https://docs.data.world

In [4]:
# See what is in it
dataset_local.describe()

{'description': 'The purpose of this dataset is to understand the rating distributions of Netflix shows.\n\n# Background\nNetflix in the past 5-10 years has captured a large populate of viewers. With more viewers, there most likely an increase of show variety. However, do people understand the distribution of ratings on Netflix shows? \n\n# Netflix Suggestion Engine\nBecause of the vast amount of time it would take to gather 1,000 shows one by one, the gathering method took advantage of the Netflixâ€™s suggestion engine. The suggestion engine recommends shows similar to the selected show. As part of this data set, I took 4 videos from 4 ratings (totaling 16 unique shows), then pulled 53 suggested shows per video. The ratings include: G, PG, TV-14, TV-MA. I chose not to pull from every rating (e.g. TV-G, TV-Y, etc.).\n\n## Source\nAccess to the study can be found at [The Concept Center](http://theconceptcenter.com/simple-research-study-netflix-shows-analysis/)',
 'homepage': 'https://da

In [12]:
help(dw.load_dataset)
help(dw.query)

Help on function load_dataset in module datadotworld:

load_dataset(dataset_key, force_update=False, profile='default')
    Load a dataset from the local filesystem, downloading it from data.world
    first, if necessary.
    
    This function returns an object of type `LocalDataset`. The object
    allows access to metedata via it's `describe()` method and to all the data
    via three properties `raw_data`, `tables` and `dataframes`, all of which
    are mappings (dict-like structures).
    
    
    Parameters
    ----------
    dataset_key : str
        Dataset identifier, in the form of owner/id or of a url
    force_update : bool
        Flag, indicating if a new copy of the dataset should be downloaded
        replacing any previously downloaded copy
    profile : str, optional
        Configuration profile (account) to use.
    
    Returns
    -------
    LocalDataset
        The object representing the dataset
    
    Raises
    ------
    RestApiError
        If a server e

In [8]:
list(dataset_local.dataframes)

['by_show_by_rating_value',
 'by_year',
 'by_show_by_single_rating',
 'by_show_by_rating',
 'by_rating',
 'netflix']

In [34]:
# test run from help(dw.query)
results = dw.query(
        dataset_key,
        'SELECT * FROM netflix')
df = results.dataframe
df.shape
df.columns
type(df.columns)

pandas.core.index.Index

In [35]:
df.tail()
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 7 columns):
rating               1000 non-null object
ratingdescription    1000 non-null int64
ratinglevel          941 non-null object
release_year         1000 non-null int64
title                1000 non-null object
user_rating_score    605 non-null float64
user_rating_size     1000 non-null int64
dtypes: float64(1), int64(3), object(3)
memory usage: 62.5+ KB
