# Pokemon

In this homework, you'll read, process, and group CSV data to compute descriptive statistics in two ways for each problem: with the Pandas library and without the Pandas library.

In [1]:
import doctest
import io
import pandas as pd

# For prettifying doctest output involving data structures
# See also: https://stackoverflow.com/a/21227671
from pprint import pprint

In the *[Pokémon](https://en.wikipedia.org/wiki/Pok%C3%A9mon)* video game series, the player catches **pokemon**, fictional creatures trained to battle each other as part of a sport franchise. For this first task, you'll practice creating your own pokemon-themed CSV dataset in the following format.

In [2]:
pokemon_box = pd.read_csv("pokemon_box.csv")
pokemon_box

Unnamed: 0,id,name,level,personality,type,weakness,atk,def,hp,stage
0,53,Persian,40,mild,normal,fighting,104,116,147,2
1,126,Magmar,44,docile,fire,water,96,83,153,1
2,99,Kingler,33,adamant,water,electric,110,169,29,2
3,57,Primeape,9,lonely,fighting,flying,20,66,43,2
4,3,Venusaur,44,sassy,grass,fire,136,195,92,3
...,...,...,...,...,...,...,...,...,...,...
110,76,Golem,78,hardy,rock,water,65,145,137,3
111,116,Horsea,69,mild,water,electric,49,36,45,1
112,6,Charizard,89,lax,fire,water,165,100,108,3
113,65,Alakazam,33,impish,psychic,dark,67,39,169,3


- `id` is a unique numeric identifier corresponding to the species of a pokemon.
- `name` is the name of the species of pokemon, such as Bulbasaur.
- `level` is the integer level of the pokemon.
- `personality` is a one-word string describing the personality of the pokemon, such as Jolly.
- `type` is a one-word string describing the type of the pokemon, such as Grass.
- `weakness` is the enemy type that this pokemon is weak toward. Bulbasaur is weak to fire-type pokemon.
- `atk`, `def`, `hp` are integers that indicate the attack power, defense power, and hit points of the pokemon.
- `stage` is an integer that indicates the particular developmental stage of the pokemon.

> Assume the **data is never empty** (there's at least one pokemon), that there's **no missing data** (each pokemon has every attribute), and **pokemon stats can be any non-negative integers, including 0**.

This assessment introduces a new way of validating and testing your data programs by comparing two different approaches to implementing the same function: writing an implementation once using plain Python and again using Pandas. For each programming task below, you'll write, document, and test each function in the same way to build confidence in their correctness and robustness.

In addition to the large `pokemon_box` dataset above, we've provided a much smaller `pokemon_test` dataset below.

In [3]:
pokemon_test = pd.read_csv(io.StringIO("""
id,name,level,personality,type,weakness,atk,def,hp,stage
59,Arcanine,35,impish,fire,water,50,55,90,2
59,Arcanine,35,gentle,fire,water,45,60,80,2
121,Starmie,67,sassy,water,electric,174,56,113,2
131,Lapras,72,lax,water,electric,107,113,29,1
"""))
pokemon_test

Unnamed: 0,id,name,level,personality,type,weakness,atk,def,hp,stage
0,59,Arcanine,35,impish,fire,water,50,55,90,2
1,59,Arcanine,35,gentle,fire,water,45,60,80,2
2,121,Starmie,67,sassy,water,electric,174,56,113,2
3,131,Lapras,72,lax,water,electric,107,113,29,1


Note that it's possible to have multiple pokemon that have very similar attributes. In the `pokemon_test` dataset, there are two pokemon named "Arcanine" with the same `id`, `level`, and `type`: differing only in `personality`, `atk`, `def`, and `hp`. Since there's not a clearly unique key to use as an index, we won't define a meaningful index for this assessment.

## Outside Sources

Update the following Markdown cell to include your name and list your outside sources. Submitted work should be consistent with the curriculum and your sources.

**Name**: Ananya Shreya Soni

## Task: Create your own dataset

Before starting your programming tasks, create at least one additional testing dataset below. In total, each function you write should contain 3 tests:

1. One test for the large `pokemon_box` dataset.
2. One test for the small `pokemon_test` dataset.
3. One test for your own `pokemon_mine` dataset below.

In [4]:
pokemon_mine = pd.read_csv(io.StringIO("""
id,name,level,personality,type,weakness,atk,def,hp,stage
131,Lapras,95,lax,water,electric,107,113,29,1
139,Omastar,31,relaxed,water,electric,138,26,21,2
1,Pikachu,95,nice,electric,water,50,55,1000,2
1,Pikachu,95,ferocious,electric,water,45,60,80,2
19,Rattata,25,serious,normal,fighting,47,84,183,1
51,Dugtrio,10,bashful,ground,water,142,176,120,2
"""))
pokemon_mine

Unnamed: 0,id,name,level,personality,type,weakness,atk,def,hp,stage
0,131,Lapras,95,lax,water,electric,107,113,29,1
1,139,Omastar,31,relaxed,water,electric,138,26,21,2
2,1,Pikachu,95,nice,electric,water,50,55,1000,2
3,1,Pikachu,95,ferocious,electric,water,45,60,80,2
4,19,Rattata,25,serious,normal,fighting,47,84,183,1
5,51,Dugtrio,10,bashful,ground,water,142,176,120,2


## Task: Species count

Write a function `python_species_count` that takes a list of dictionaries representing the pokemon dataset and returns the number of unique pokemon species in the dataset as determined by the `name` attribute without using Pandas.

Write a function `pandas_species_count` that does the same thing but using a `DataFrame` as input.

Add your test case and a descriptive docstring for both functions.

In [5]:
def python_species_count(data):
    """
    Given a list of records containing pokemon data, returns the number
    of unique pokemon species (ie: number of unique pokemon names)

    >>> python_species_count(pokemon_box.to_dict("records"))
    82
    >>> python_species_count(pokemon_test.to_dict("records"))
    3
    >>> python_species_count(pokemon_mine.to_dict("records"))
    5
    """
    species_set = set()
    for record in data:
        species_set.add(record["name"])
    return len(species_set)

doctest.run_docstring_examples(python_species_count, globals())

In [6]:
def pandas_species_count(data):
    """
    Given a data frame containing pokemon data, returns the number
    of unique pokemon species (ie: number of unique pokemon names)

    >>> pandas_species_count(pokemon_box)
    82
    >>> pandas_species_count(pokemon_test)
    3
    >>> pandas_species_count(pokemon_mine)
    5
    """
    return len(data["name"].unique())


doctest.run_docstring_examples(pandas_species_count, globals())

## Task: Max level

Write a function `python_max_level` that takes a list of dictionaries representing the pokemon dataset and returns a 2-element tuple for the `(name, level)` of the pokemon with the highest `level` in the dataset. If there are multiple pokemon with the highest `level`, return the pokemon that appears first in the dataset.

Write a function `pandas_max_level` that does the same thing but using a `DataFrame` as input.

Add your test case and a descriptive docstring for both functions.

In [7]:
def python_max_level(data):
    """
    Given a list of records containing pokemon data, returns the
    first pokemon in the dataset with the highest level along
    with its level in the format: (name, level)

    >>> python_max_level(pokemon_box.to_dict("records"))
    ('Victreebel', 100)
    >>> python_max_level(pokemon_test.to_dict("records"))
    ('Lapras', 72)
    >>> python_max_level(pokemon_mine.to_dict("records"))
    ('Lapras', 95)
    """
    max_level = -1
    max_pokemon = None
    for record in data:
        pokemon = record["name"]
        level = record["level"]
        if level > max_level:
            max_level = level
            max_pokemon = pokemon
    return (max_pokemon, max_level)


doctest.run_docstring_examples(python_max_level, globals())

In [8]:
def pandas_max_level(data):
    """
    Given a data frame containing pokemon data, returns the
    first pokemon in the dataset with the highest level along
    with its level in the format: (name, level)

    >>> pandas_max_level(pokemon_box)
    ('Victreebel', 100)
    >>> pandas_max_level(pokemon_test)
    ('Lapras', 72)
    >>> pandas_max_level(pokemon_mine)
    ('Lapras', 95)
    """
    max_level =  data["level"].max()
    max_pokemon = data[data["level"] == max_level].iloc[0, 1]
    return (max_pokemon, max_level)

doctest.run_docstring_examples(pandas_max_level, globals())

## Task: Filter range

Write a function `python_filter_range` that takes a list of dictionaries representing the pokemon dataset and two integers: a lower bound (inclusive) and upper bound (exclusive). The function should return a list of the names of pokemon whose `level` fall within the bounds in the same order that they appear in the dataset.

Write a function `pandas_filter_range` that does the same thing but using a `DataFrame` as input. To convert a `Series` to a `list`, use the built-in `list` function as shown below.

```python
csv = """
name,age,species
Fido,4,dog
Meowrty,6,cat
Chester,1,dog
Phil,1,axolotl
"""
data = pd.read_csv(io.StringIO(csv))

list(data['name'])
# ['Fido', 'Meowrty', 'Chester', 'Phil']

list(data.loc[1])
# ['Meowrty', 6, 'cat']
```

Add your test case and a descriptive docstring for both functions.

In [9]:
def python_filter_range(data, lower, upper):
    """
    Given a list of records containing pokemon data, a lower bound,
    and a upperbound, returns the list of pokemon in the dataset whose
    level are greater than or equal to the lower bound and less than
    the upperbound

    >>> pprint(python_filter_range(pokemon_box.to_dict("records"), 0, 10))
    ['Primeape',
     'Metapod',
     'Caterpie',
     'Ninetales',
     'Weezing',
     'Tangela',
     'Butterfree',
     'Exeggcute',
     'Arcanine']
    >>> pprint(python_filter_range(pokemon_test.to_dict("records"), 35, 72))
    ['Arcanine', 'Arcanine', 'Starmie']
    >>> pprint(python_filter_range(pokemon_mine.to_dict("records"), 95, 100))
    ['Lapras', 'Pikachu', 'Pikachu']
    """
    pokemon = []
    for record in data:
        pokemon_name = record["name"]
        level = record["level"]
        if level >= lower and level < upper:
            pokemon.append(pokemon_name)
    return pokemon


doctest.run_docstring_examples(python_filter_range, globals())

In [10]:
def pandas_filter_range(data, lower, upper):
    """
    Given a data frame containing pokemon data, a lower bound,
    and a upperbound, returns the list of pokemon in the dataset whose
    level are greater than or equal to the lower bound and less than
    the upperbound

    >>> pprint(pandas_filter_range(pokemon_box, 0, 10))
    ['Primeape',
     'Metapod',
     'Caterpie',
     'Ninetales',
     'Weezing',
     'Tangela',
     'Butterfree',
     'Exeggcute',
     'Arcanine']
    >>> pprint(pandas_filter_range(pokemon_test, 35, 72))
    ['Arcanine', 'Arcanine', 'Starmie']
    >>> pprint(pandas_filter_range(pokemon_mine, 95, 100))
    ['Lapras', 'Pikachu', 'Pikachu']
    """
    return list(data[(data["level"] >= lower) & (data["level"] < upper)]["name"])


doctest.run_docstring_examples(pandas_filter_range, globals())

## Task: Mean attack for type

Write a function `python_mean_attack_for_type` that takes a list of dictionaries representing the pokemon dataset and a `str` representing the pokemon `type`. The function should return the average `atk` for all the pokemon in the dataset with the given `type`. If there are no pokemon of the given `type`, return `None`.

Write a function `pandas_mean_attack_for_type` that does the same thing but using a `DataFrame` as input.

Add your test case and a descriptive docstring for both functions.

In [11]:
def python_mean_attack_for_type(data, pokemon_type):
    """
    Given a list of records containing pokemon data, and a pokemon type,
    returns the mean atk for pokemon in the data set with type matching
    the provided pokemon type or None if there are 0 pokemon with the
    specified type in the data set

    >>> python_mean_attack_for_type(pokemon_box.to_dict("records"), "water")
    99.75
    >>> python_mean_attack_for_type(pokemon_test.to_dict("records"), "fire")
    47.5
    >>> python_mean_attack_for_type(pokemon_test.to_dict("records"), "air")
    >>> python_mean_attack_for_type(pokemon_mine.to_dict("records"), "water")
    122.5
    """
    total_attack = 0
    num_pokemon = 0
    for record in data:
        pokemon_attack = record["atk"]
        if record["type"] == pokemon_type:
            total_attack += pokemon_attack
            num_pokemon += 1
    if num_pokemon == 0:
        return None
    return total_attack / num_pokemon


doctest.run_docstring_examples(python_mean_attack_for_type, globals())

In [12]:
def pandas_mean_attack_for_type(data, pokemon_type):
    """
    Given a data frame containing pokemon data, and a pokemon type,
    returns the mean atk for pokemon in the data set with type matching
    the provided pokemon type or None if there are 0 pokemon with the
    specified type in the data set

    >>> pandas_mean_attack_for_type(pokemon_box, "water")
    99.75
    >>> pandas_mean_attack_for_type(pokemon_test, "fire")
    47.5
    >>> pandas_mean_attack_for_type(pokemon_test, "air")
    >>> pandas_mean_attack_for_type(pokemon_mine, "water")
    122.5
    """
    pokemon_with_specified_type = data[data["type"] == pokemon_type]
    if len(pokemon_with_specified_type) == 0:
        return None
    return pokemon_with_specified_type["atk"].mean()


doctest.run_docstring_examples(pandas_mean_attack_for_type, globals())

## Task: Count types

Write a function `python_count_types` that takes a list of dictionaries representing the pokemon dataset and returns a dictionary of each pokemon `type` and the number of pokemon of that `type`. The order of entries in the returned dictionary does not matter.

Write a function `pandas_count_types` that does the same thing but using a `DataFrame` as input. To convert a `Series` to a `dict`, use the built-in `dict` function as shown below.

```python
csv = """
name,age,species
Fido,4,dog
Meowrty,6,cat
Chester,1,dog
Phil,1,axolotl
"""
data = pd.read_csv(io.StringIO(csv))

dict(data['name'])
# {0: 'Fido', 1: 'Meowrty', 2: 'Chester', 3: 'Phil'}

dict(data.loc[1])
# {'name': 'Meowrty', 'age': 6, 'species': 'cat'}
```

Add your test case and a descriptive docstring for both functions.

In [13]:
def python_count_types(data):
    """
    Given a list of records containing pokemon data, returns the counts of
    pokemon types that appear in the data set

    >>> pprint(python_count_types(pokemon_box.to_dict("records")))
    {'bug': 3,
     'electric': 1,
     'fairy': 3,
     'fighting': 3,
     'fire': 15,
     'flying': 6,
     'ghost': 3,
     'grass': 17,
     'ground': 5,
     'normal': 10,
     'poison': 12,
     'psychic': 6,
     'rock': 7,
     'water': 24}
    >>> pprint(python_count_types(pokemon_test.to_dict("records")))
    {'fire': 2, 'water': 2}
    >>> pprint(python_count_types(pokemon_mine.to_dict("records")))
    {'electric': 2, 'ground': 1, 'normal': 1, 'water': 2}
    """
    counts = {}
    for record in data:
        pokemon_type = record["type"]
        if pokemon_type not in counts:
            counts[pokemon_type] = 0
        counts[pokemon_type] += 1
    return counts



doctest.run_docstring_examples(python_count_types, globals())

In [14]:
def pandas_count_types(data):
    """
    Given a data frame containing pokemon data, returns the counts of
    pokemon types that appear in the data set

    >>> pprint(pandas_count_types(pokemon_box))
    {'bug': 3,
     'electric': 1,
     'fairy': 3,
     'fighting': 3,
     'fire': 15,
     'flying': 6,
     'ghost': 3,
     'grass': 17,
     'ground': 5,
     'normal': 10,
     'poison': 12,
     'psychic': 6,
     'rock': 7,
     'water': 24}
    >>> pprint(pandas_count_types(pokemon_test))
    {'fire': 2, 'water': 2}
    >>> pprint(pandas_count_types(pokemon_mine))
    {'electric': 2, 'ground': 1, 'normal': 1, 'water': 2}
    """
    return dict(data.groupby("type")["name"].count())


doctest.run_docstring_examples(pandas_count_types, globals())

## Task: Mean attack per type

Write a function `python_mean_attack_per_type` that takes a list of dictionaries representing the pokemon dataset and returns a dictionary of each pokemon `type` and the average `atk` of pokemon of that `type`. The order of entries in the returned dictionary does not matter.

Write a function `pandas_mean_attack_per_type` that does the same thing but using a `DataFrame` as input.

Add your test case and a descriptive docstring for both functions.

In [15]:
def python_mean_attack_per_type(data):
    """
    Given a list of records containing pokemon data, returns the mean
    atk for pokemon in the data set grouped by their type (ie: for
    each type of pokemon in the data set returns the mean atk for
    pokemon in the data set with that type)

    >>> pprint(python_mean_attack_per_type(pokemon_box.to_dict("records")))
    {'bug': 25.0,
     'electric': 64.0,
     'fairy': 76.33333333333333,
     'fighting': 99.66666666666667,
     'fire': 99.4,
     'flying': 110.83333333333333,
     'ghost': 88.0,
     'grass': 105.3529411764706,
     'ground': 116.6,
     'normal': 108.0,
     'poison': 121.75,
     'psychic': 114.83333333333333,
     'rock': 84.85714285714286,
     'water': 99.75}
    >>> pprint(python_mean_attack_per_type(pokemon_test.to_dict("records")))
    {'fire': 47.5, 'water': 140.5}
    >>> pprint(python_mean_attack_per_type(pokemon_mine.to_dict("records")))
    {'electric': 47.5, 'ground': 142.0, 'normal': 47.0, 'water': 122.5}
    """
    mean_attack_per_type = {}
    types = python_count_types(data).keys()
    for type in types:
        mean_attack_per_type[type] = python_mean_attack_for_type(data, type)
    return mean_attack_per_type



doctest.run_docstring_examples(python_mean_attack_per_type, globals())

In [16]:
def pandas_mean_attack_per_type(data):
    """
    Given a data frame containing pokemon data, returns the mean
    atk for pokemon in the data set grouped by their type (ie: for
    each type of pokemon in the data set returns the mean atk for
    pokemon in the data set with that type)

    >>> pprint(pandas_mean_attack_per_type(pokemon_box))
    {'bug': 25.0,
     'electric': 64.0,
     'fairy': 76.33333333333333,
     'fighting': 99.66666666666667,
     'fire': 99.4,
     'flying': 110.83333333333333,
     'ghost': 88.0,
     'grass': 105.3529411764706,
     'ground': 116.6,
     'normal': 108.0,
     'poison': 121.75,
     'psychic': 114.83333333333333,
     'rock': 84.85714285714286,
     'water': 99.75}
    >>> pprint(pandas_mean_attack_per_type(pokemon_test))
    {'fire': 47.5, 'water': 140.5}
    >>> pprint(pandas_mean_attack_per_type(pokemon_mine))
    {'electric': 47.5, 'ground': 142.0, 'normal': 47.0, 'water': 122.5}
    """
    return dict(data.groupby("type")["atk"].mean())


doctest.run_docstring_examples(pandas_mean_attack_per_type, globals())

## Testing

In [17]:
test_results = doctest.testmod()
print(test_results)
assert test_results.failed == 0, "There are failed doctests."
assert test_results.attempted >= 36, "Total number of doctests should be at least 36; less than 36 means you did not have three tests per function."

TestResults(failed=0, attempted=38)
