<hr style="border:5px solid #108999"> </hr>

# pandas Basics <hr style="border:4.5px solid #108999"> </hr>

## Introduction to pandas Series

Import the 'pandas' library to load it into the computer's memory, so that you can work with it in this Notebook Document.

<br/> *Note: Don't forget to use the widely-accepted convention as well.*

<br/> Remember that no matter how many times you execute this code cell, the library will be imported only once in this Document, and it will remain active.

In [1]:
import pandas as pd

Check the version of the library you just imported.

In [2]:
pd.__version__

'1.0.5'

Create the **employee_names** list.

In [3]:
employee_names = ['Amy White', 'Jack Stewart', 'Richard Lauderdale', 'Sara Johnson']
employee_names

['Amy White', 'Jack Stewart', 'Richard Lauderdale', 'Sara Johnson']

Verify the **employee_names** object is a list.

In [4]:
type(employee_names)

list

Create a pandas Series object containing the elements from the **employee_names** list. Call it **employee_names_Series**.

In [5]:
employee_names_Series = pd.Series(employee_names)
employee_names_Series

0             Amy White
1          Jack Stewart
2    Richard Lauderdale
3          Sara Johnson
dtype: object

Confirm the object is of a Series type.

*Note: Feel free to take advantage of the Jupyter autocompletion feature. You can activate it through the Tab button while typing code.*

In [6]:
type(employee_names_Series)

pandas.core.series.Series

Now, create a Series object directly. That is, not from an existing list, but by using the following structure:
**pd.Series([...])**
<br/>Let the elements of the Series object be the following numbers: 5, 8, 3, and 10. Name the object **work_experience_yrs.**

In [7]:
work_experience_years = pd.Series([5,8,3,10])
work_experience_years

0     5
1     8
2     3
3    10
dtype: int64

Import the 'NumPy' module to load it in the computer's memory, so that you can work with it in this Notebook Document.

<br/> *Note: don't forget to use the widely-accepted convention as well*

In [8]:
import numpy as np

Execute the code cell below to create the **array_1** NumPy array object.

In [9]:
array_age = np.array([50, 53, 35, 43])
array_age

array([50, 53, 35, 43])

Verify the type of the **array_age** object:

In [10]:
type(array_age)

numpy.ndarray

Create a Series object called **series_age** from the NumPy array object **array_age** you just created.

In [11]:
series_age = pd.Series(array_age)
series_age

0    50
1    53
2    35
3    43
dtype: int32

Check the type of the newly created object.

In [12]:
type(series_age)

pandas.core.series.Series

Use the *print()* function to display the content of **series_age**.

In [13]:
print(series_age)

0    50
1    53
2    35
3    43
dtype: int32


## Working with Attributes in Python

Focus on the following Series object.

In [14]:
work_experience_years = pd.Series([5,8,3,10])
work_experience_years

0     5
1     8
2     3
3    10
dtype: int64

Return the values stored in **work_experience_years**.

In [15]:
work_experience_years.to_numpy()

array([ 5,  8,  3, 10], dtype=int64)

Check the type of the returned object.

In [17]:
type(work_experience_years.values)

numpy.ndarray

Use an attribute to find the number of elements in the underlying data.
<br/>*Note: The same output will be displayed whether or not you make use of the **print()** function.*

In [18]:
work_experience_years.size

4

Assign the following name to this Series: **Work Experience (Yrs.)**

In [19]:
work_experience_years.name = "Work Experience (Yrs.)"

Display the name of the Series.

In [20]:
work_experience_years.name

'Work Experience (Yrs.)'

Display the Series itself, to see the name appear below the data values it contains.

In [21]:
work_experience_years

0     5
1     8
2     3
3    10
Name: Work Experience (Yrs.), dtype: int64

## Using an Index in pandas

Execute the following code cell to create a dictionary that includes data about the names of the employees as its *keys*, as well as their age as *values*.

In [22]:
workers_age = {'Amy White':50, 'Jack Stewart':53, 'Richard Lauderdale':35, 'Sara Johnson':43}
workers_age

{'Amy White': 50,
 'Jack Stewart': 53,
 'Richard Lauderdale': 35,
 'Sara Johnson': 43}

Verify the type of **workers_age** is a dictionary.

In [23]:
type(workers_age)

dict

Create a Series from **workers_age**, giving it the same name.

In [24]:
workers_age = pd.Series(workers_age)
workers_age

Amy White             50
Jack Stewart          53
Richard Lauderdale    35
Sara Johnson          43
dtype: int64

Verify **workers_age** is a Series object.

In [25]:
type(workers_age)

pandas.core.series.Series

Retrieve the index of *workers_age*.

In [26]:
workers_age.index

Index(['Amy White', 'Jack Stewart', 'Richard Lauderdale', 'Sara Johnson'], dtype='object')

## Label-based vs Position-based Indexing

Create a pandas Series object from a dictionary with keys "Martin" and "George" and values 8 and 5, respectively. Call this Series **employees_work_exp**, as from "workers work experience".

In [27]:
employees_work_exp = pd.Series({'Martin':8, 'George':5})
employees_work_exp

Martin    8
George    5
dtype: int64

Retrieve the index values to see they are *labels*.

In [28]:
employees_work_exp.index

Index(['Martin', 'George'], dtype='object')

Extract the first of these values to prove they are strings.  

In [29]:
type(employees_work_exp.index[0])

str

Create a pandas Series object from an array that contains the following values: 44, 54, 65, 35. Call it **series_age**.

In [30]:
series_age = pd.Series(np.array([44, 54, 65, 35]))
series_age

0    44
1    54
2    65
3    35
dtype: int32

Retrieve the index values of **series_age** to see they are numbers, thus representing positioned data.

In [31]:
series_age.index

RangeIndex(start=0, stop=4, step=1)

## Using Methods in Python

Consider the following Series object.

In [32]:
employees_work_exp = pd.Series({
'Amy White'   : 3,
'Jack Stewart'   : 5,
'Richard Lauderdale'  : 4.5,
'Sara Johnson'  : 22,
'Patrick Adams' : 28,
'Jessica Baker'  : 14,
'Peter Hunt'   : 4,
'Daniel Lloyd'  : 6,
'John Owen'   : 1.5,
'Jennifer Phillips'  : 10,
'Courtney Rogers'   : 4.5,
'Anne Robinson'  : 2,
})

Use a certain method to extract the top five values from this Series.  <br/> *Please be aware that pandas may automatically display the values of the object as floats as opposed to integers.*


In [33]:
employees_work_exp.head()

Amy White              3.0
Jack Stewart           5.0
Richard Lauderdale     4.5
Sara Johnson          22.0
Patrick Adams         28.0
dtype: float64

Use another method to extract the last few rows of **employees_work_experience**.

In [34]:
employees_work_exp.tail()

Daniel Lloyd          6.0
John Owen             1.5
Jennifer Phillips    10.0
Courtney Rogers       4.5
Anne Robinson         2.0
dtype: float64

## Parameters vs Arguments

Consider the following Series object.

In [35]:
employees_work_exp = pd.Series({
'Amy White'   : 3,
'Jack Stewart'   : 5,
'Richard Lauderdale'  : 4.5,
'Sara Johnson'  : 22,
'Patrick Adams' : 28,
'Jessica Baker'  : 14,
'Peter Hunt'   : 4,
'Daniel Lloyd'  : 6,
'John Owen'   : 1.5,
'Jennifer Phillips'  : 10,
'Courtney Rogers'   : 4.5,
'Anne Robinson'  : 2,
})

Use a pandas method to retrieve the first three records of the object.

In [36]:
employees_work_exp.head(3)

Amy White             3.0
Jack Stewart          5.0
Richard Lauderdale    4.5
dtype: float64

Use a pandas method to retrieve the last four records of the object.

In [37]:
employees_work_exp.tail(4)

John Owen             1.5
Jennifer Phillips    10.0
Courtney Rogers       4.5
Anne Robinson         2.0
dtype: float64

## Introduction to pandas DataFrames

Create the following DataFrame in 4 different ways. (You don't need to think about assigning index values yet.)

![Capture.PNG](attachment:Capture.PNG)

Example solutions:

In [38]:
data = {
    "Name":['Amy White', 'Jack Stewart', 'Richard Lauderdale', 'Sara Johnson'], 
    "Age":[50, 53, 35, 43], 
    "Working Experience (Yrs.)":[5,8,3,10]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Working Experience (Yrs.)
0,Amy White,50,5
1,Jack Stewart,53,8
2,Richard Lauderdale,35,3
3,Sara Johnson,43,10


In [39]:
data = [{'Name':'Amy White', 'Age':50, 'Working Experience (Yrs.)':5}, 
        {'Name':'Jack Stewart', 'Age':53, 'Working Experience (Yrs.)':8}, 
        {'Name':'Richard Lauderdale', 'Age':35, 'Working Experience (Yrs.)':3},
        {'Name':'Sara Johnson', 'Age':43, 'Working Experience (Yrs.)':10}]
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Working Experience (Yrs.)
0,Amy White,50,5
1,Jack Stewart,53,8
2,Richard Lauderdale,35,3
3,Sara Johnson,43,10


In [40]:
names = pd.Series(['Amy White', 'Jack Stewart', 'Richard Lauderdale', 'Sara Johnson'])
age = pd.Series([50, 53, 35, 43])
working_experience_yrs = pd.Series([5,8,3,10])

data = {'Name':names, 'Age':age, 'Working Experience (Yrs.)':work_experience_years}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Working Experience (Yrs.)
0,Amy White,50,5
1,Jack Stewart,53,8
2,Richard Lauderdale,35,3
3,Sara Johnson,43,10


In [41]:
data = [['Amy White', 50, 5], ['Jack Stewart', 53, 35], ['Richard Lauderdale', 35, 3], ['Sara Johnson', 43, 10]]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Working Experience (Yrs.)'])
df

Unnamed: 0,Name,Age,Working Experience (Yrs.)
0,Amy White,50,5
1,Jack Stewart,53,35
2,Richard Lauderdale,35,3
3,Sara Johnson,43,10


Modify the code below to add integers starting from 1 in ascending order as index values.  

In [42]:
data = {
    "Name":['Amy White', 'Jack Stewart', 'Richard Lauderdale', 'Sara Johnson'], 
    "Age":[50, 53, 35, 43], 
    "Working Experience (Yrs.)":[5,8,3,10]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Working Experience (Yrs.)
0,Amy White,50,5
1,Jack Stewart,53,8
2,Richard Lauderdale,35,3
3,Sara Johnson,43,10


becomes

In [43]:
data = {
    "Name":['Amy White', 'Jack Stewart', 'Richard Lauderdale', 'Sara Johnson'], 
    "Age":[50, 53, 35, 43], 
    "Working Experience (Yrs.)":[5,8,3,10]}
df = pd.DataFrame(data, index = [1, 2, 3, 4])
df

Unnamed: 0,Name,Age,Working Experience (Yrs.)
1,Amy White,50,5
2,Jack Stewart,53,8
3,Richard Lauderdale,35,3
4,Sara Johnson,43,10
