## Intro

### In this lesson, you will learn about...

- Pandas Series
- Attributes
- Binning values
- Summarizing a series
- Vectorized operation using a user-defined function

### By the end of this lesson, you should be able to...

- Create a new series
- Perform vectorized operations on a series
- Access attributes of a series
- Describe values of a series (.describe, .value_counts)
- Peek into the series (.head, .tail, .sample)
- Sort values (sort_values, sort_index)
- Test for values in the series (.isin, .any, .all)
- Perform string manipulation (.str)
- Apply a user defined function to all items in a series (.apply)
- Bin continuous data to convert it to discrete (.cut)
- Plot series values (.plot)

### Agenda

1. About Pandas Series
2. Series Part 1
    - Create a Series
    - Vectorized Operations
    - Series Attributes: .index, .values, .dtype, .name, .size, .shape
    - Series Methods: .head, .tail, .sample, ,astype, .value_counts, .describe, .nlargest, .nsmallest, sort_values, .sort_index
3. Exercises, part I
4. Series Part II
    - Indexing and Subsetting
    - Series Attribute: .str
    - Series Methods: .any, .all, .isin, .apply
5. Exercises, part II
6. Series Part III
    - Binning
    - Plotting
7. Exercises, part III

## 1. About Pandas Series

A pandas Series object is a one-dimensional, labeled array made up of an autogenerated index that starts at 0 and data of a single data type.

A couple of important things to note about a Series:

- If I try to make a pandas Series using multiple data types like int and string values, the data will be converted to the same object data type; the int values will lose their int functionality.
- A pandas Series can be created in several ways; we will look at a few of these ways below. However, **it will most often be created by selecting a single column from a pandas Dataframe in which case the Series retains the same index as the Dataframe.** We will dive into this in the next two lessons: DataFrames and Advanced DataFrames.

---

Numpy vs. Pandas

- Numpy: Python library for representing n-dimensional arrays.
- Pandas: Python library, built upon Numpy, for representing series and dataframes which are tabular structures.

---

Series vs. Dataframes

- Series: a one-dimensional, labeled array. A series has row names but no column name.
- Dataframes: 2-d structures that represent datasets. Imagine a table with rows and columns. A dataframe has row names and column names.

---

Series vs. List

- Series contains an index, which can be thought of as a row name (often is a row number), which is a way to reference items. The index is stored with other meta-information (information about the series).
- the elements are of a specific data type. The data type is inferred, but can be manually specified.

## 2. Series Part I

- Create a Series
- Series data types
- Vectorized Operations
- Series Attributes: .index, .values, .dtype, .name, .size, .shape
- Series Methods: .head, .tail, .sample, ,astype, .value_counts, .describe, .nlargest, .nsmallest, sort_values, .sort_index

Import Pandas

`import pandas as pd`

In [2]:
import pandas as pd
import numpy as np
from pydataset import data

### Create a Series

In practice, a Series will most often be created by selecting a single column from a pandas Dataframe in which case the Series retains the same index as the Dataframe.

1. from a list
2. from a numpy array
3. from a dictionary
4. from a dataframe

From a List

In [3]:
my_list = [2, 3, 5]
type(my_list)

list

Using an index to access value in list is possible, but those indices are integers representing location and cannot be changed to be a name, datetime, etc.

In [4]:
my_list[0]

2

Create series from list, similar to how you would convert a list to an array with `np.array(my_list)`, using `pd.Series(my_list)`.

*Notice how the `S` is capitalized.*

In [5]:
my_series = pd.Series(my_list)

What kind of object is that?

In [6]:
type(my_series)

pandas.core.series.Series

What's inside the series?

`my_series`

In [7]:
my_series


0    2
1    3
2    5
dtype: int64

- 3 rows, with the row indices (or row names) as [0, 1, 2]
- the values are [2, 3, 5]
- the datatype is int64 (i.e. will store LARGE integers)