## pandas Series and DataFrame
- mixed data types, customized indexing, missing data, data that’s not structured consistently and data that needs to be manipulated into forms appropriate for the databases and data analysis packages
- two key collections: Series for one-dimensional collections and DataFrames for two-dimensional collections
- MultiIndex to manipulate multidimensional data in the context of Series and DataFrames.

### pandas `Series`
- A Series is an enhanced one-dimensional array.
- Series support custom indexing, including even non-integer indices like strings.

#### Creating a `Series` with Default Indices

In [None]:
import pandas as pd

In [None]:
grades = pd.Series([87, 100, 94])

In [None]:
grades

In [None]:
pd.Series((87, 100, 94))

In [None]:
pd.Series({'math':87,'eng':100, 'python':94})

In [None]:
import numpy as np
pd.Series(np.array([87, 100, 94]))

In [None]:
pd.Series(pd.Series([1, 2, 3]))

#### Creating a `Series` with All Elements Having the Same Value

In [None]:
pd.Series(98.6, range(3))

In [None]:
pd.Series(98.6, [1, 2, 3])

In [None]:
pd.Series(98.6, (1, 2, 3))

In [None]:
pd.Series(98.6, {'a':1, 'b':2, 'c':3})

#### Accessing a `Series`’ Elements

In [None]:
grades[0]

#### Producing Descriptive Statistics for a `Series`

In [None]:
grades.count()

In [None]:
grades.mean()

In [None]:
grades.min()

In [None]:
grades.max()

In [None]:
grades.std()

In [None]:
grades.describe()

#### Creating a `Series` with Custom Indices

In [None]:
grades = pd.Series([87, 100, 94], index=['Wally', 'Eva', 'Sam'])

In [None]:
grades

#### Accessing a `Series`’ Elements Via Custom Indices

In [None]:
grades['Eva']

In [None]:
# If the custom indices are strings that could represent valid Python identifiers
grades.Wally

In [None]:
grades.dtype

In [None]:
grades.values

In [None]:
grades.index

#### Creating a Series of Strings
- If a Series contains strings, you can use its str attribute to call string methods on the
elements.

In [None]:
hardware = pd.Series(['Hammer', 'Saw', 'Wrench'])

In [None]:
hardware

In [None]:
hardware.str.contains('a')

In [None]:
hardware.str.upper()

## pandas DataFrames
- A DataFrame is an enhanced two-dimensional array.
- Each column in a DataFrame is a Series.

#### Creating a `DataFrame` from a Dictionary

In [None]:
import pandas as pd

In [None]:
grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90],
               'Sam': [94, 77, 90], 'Katie': [100, 81, 82],
               'Bob': [83, 65, 85]}

In [None]:
grades_dict

In [None]:
grades = pd.DataFrame(grades_dict)

In [None]:
grades

#### Customizing a `DataFrame`’s Indices with the `index` Attribute 

```python
pd.DataFrame(grades_dict, index=['Test1', 'Test2', 'Test3'])
```

In [None]:
grades.index = ['Test1', 'Test2', 'Test3']

In [None]:
grades

#### Accessing a `DataFrame`’s Columns 

In [None]:
grades['Eva']

In [None]:
grades.Sam

#### Selecting Rows via the `loc` and `iloc` Attributes

In [None]:
grades.loc['Test1']

In [None]:
grades.iloc[1]

#### Selecting Rows via Slices and Lists with the `loc` and `iloc` Attributes

In [None]:
grades.loc['Test1':'Test3']

In [None]:
grades.iloc[0:2]

In [None]:
grades.loc[['Test1', 'Test3']]

In [None]:
grades.iloc[[0, 2]]

#### Selecting Subsets of the Rows and Columns 

In [None]:
grades.loc['Test1':'Test2', ['Eva', 'Katie']]

In [None]:
grades.iloc[[0, 2], 0:3]

#### Boolean Indexing

In [None]:
grades[grades >= 90]

In [None]:
grades[(grades >= 80) & (grades < 90)]

#### Accessing a Specific `DataFrame` Cell by Row and Column
- A DataFrame’s at and iat attributes gets a single value from a DataFrame.

In [None]:
grades.at['Test2', 'Eva']

In [None]:
grades.iat[2, 0]

In [None]:
grades.at['Test2', 'Eva'] = 100

In [None]:
grades.at['Test2', 'Eva']

In [None]:
grades.iat[1, 1] = 87

In [None]:
grades.iat[1, 1]

#### Descriptive Statistics

In [None]:
grades.describe()

In [None]:
pd.set_option("display.precision", 2)

In [None]:
grades.describe()

In [None]:
grades.mean()

#### Transposing the `DataFrame` with the `T` Attribute

In [None]:
grades.T

In [None]:
grades.T.describe()

In [None]:
grades.T.mean()

#### Sorting By Rows by Their Indices
- A DataFrame by its rows or columns, based on their indices or values can be sorted

In [None]:
grades.sort_index(ascending=False)

In [None]:
grades.sort_index()

#### Sorting By Column Indices

In [None]:
grades.sort_index(axis=1)

In [None]:
grades.sort_index(axis=0)

#### Sorting By Column Values

In [None]:
grades.sort_values(by='Test1', axis=1, ascending=False)

In [None]:
grades.T.sort_values(by='Test1', ascending=False)

In [None]:
grades.loc['Test1'].sort_values(ascending=False)

In [None]:
grades

#### Copy vs. In-Place Sorting

In [None]:
grades.sort_values(by='Test1', axis=1, ascending=False, inplace=True)

In [None]:
grades

### Regular Expressions and Data Munging 
- Preparing data for analysis is called data munging or data wrangling.
- Two of the most important steps in data munging are data cleaning and transforming data into the optimal formats for your database systems and analytics software.

#### Cleaning Your Data 
['Brown, Sue', 36.5, 36.3, 36.7, 0.0]

#### Data Validation

In [None]:
import pandas as pd

In [None]:
zips = pd.Series({'Boston': '02215', 'Miami': '3310'})

In [None]:
zips

In [None]:
zips.str.match(r'\d{5}')

In [None]:
cities = pd.Series(['Boston, MA 02215', 'Miami, FL 33101'])

In [None]:
cities

In [None]:
cities.str.contains(r' [A-Z]{2} ')

In [None]:
cities.str.match(r' [A-Z]{2} ')

#### Reformatting Your Data

In [None]:
contacts = [['Mike Green', 'demo1@deitel.com', '5555555555'],
            ['Sue Brown', 'demo2@deitel.com', '5555551234']]

In [None]:
contactsdf = pd.DataFrame(contacts, 
                          columns=['Name', 'Email', 'Phone'])

In [None]:
contactsdf

In [None]:
import re

In [None]:
def get_formatted_phone(value):
    result = re.fullmatch(r'(\d{3})(\d{3})(\d{4})', value)
    return '-'.join(result.groups()) if result else value

In [None]:
formatted_phone = contactsdf['Phone'].map(get_formatted_phone)

In [None]:
formatted_phone

In [None]:
contactsdf['Phone'] = formatted_phone

In [None]:
contactsdf