<hr style="border:5px solid #108999"> </hr>

# pandas Series <hr style="border:4.5px solid #108999"> </hr>

For starters, import the pandas library, using the relevant well-known convention.

In [1]:
import pandas as pd

## .unique() & .nunique()

Load the "Region.csv" file and set the *squeeze* option to *True*. Store the information in a variable called **region_data**. Preview the data with the pandas *.head()* method.

In [2]:
data = pd.read_csv('Region.csv', squeeze = True)
region_data = data.copy()
region_data.head()

0    Region 2
1    Region 6
2    Region 3
3    Region 2
4    Region 3
Name: Region, dtype: object

Verify that **region_data** is a Series object.

In [3]:
type(region_data)

pandas.core.series.Series

Use the *.describe()* method to obtain descriptive statistics on the **region_data** Series.
<br>Think of how many unique values there are in the data set. 
<br>*Please note that the statistics provided in the output exclude missing data.*

In [4]:
region_data.describe()

count         1042
unique          18
top       Region 6
freq           326
Name: Region, dtype: object

You can obtain some of the values from the previous output by using Python built-in functions or methods. Please extract the total number of values from the **region_data** Series, then the number of unique values. Finally, obtain an array containing all unique values from this Series.

In [5]:
len(region_data)

1042

In [6]:
region_data.nunique()

18

In [7]:
region_data.unique()

array(['Region 2', 'Region 6', 'Region 3', 'Region 1', 'Region 5',
       'Region 9', 'Region 7', 'Region 4', 'Region 12', 'Region 16',
       'Region 8', 'Region 10', 'Region 13', 'Region 15', 'Region 11',
       'Region 14', 'Region 17', 'Region 18'], dtype=object)

## .sort_values()

Load the "Region.csv" file and set the *squeeze* option to *True*. Store the information in a variable called **region_data**. Preview the data with the pandas *.head()* method.

In [8]:
data = pd.read_csv('Region.csv', squeeze = True)
region_data = data.copy()
region_data.head()

0    Region 2
1    Region 6
2    Region 3
3    Region 2
4    Region 3
Name: Region, dtype: object

Sort the values without specifying any arguments.

*Please note that the numbers go from 1 to 18, but we are currently ordering the values as labels, not integers. 1 is succeeded by 10, then 11, and so on until 18. Then we have 2, 3, 4, etc., until 8, and eventually - 9.*

In [9]:
region_data.sort_values()

462    Region 1
347    Region 1
609    Region 1
610    Region 1
339    Region 1
         ...   
536    Region 9
450    Region 9
8      Region 9
842    Region 9
940    Region 9
Name: Region, Length: 1042, dtype: object

Sort the values, setting the *ascending* parameter equal to *True*.

In [10]:
region_data.sort_values(ascending=True)

462    Region 1
347    Region 1
609    Region 1
610    Region 1
339    Region 1
         ...   
536    Region 9
450    Region 9
8      Region 9
842    Region 9
940    Region 9
Name: Region, Length: 1042, dtype: object

Sort the values in **region_data** in descending order.

In [11]:
region_data.sort_values(ascending=False)

940    Region 9
842    Region 9
8      Region 9
450    Region 9
536    Region 9
         ...   
339    Region 1
610    Region 1
609    Region 1
347    Region 1
462    Region 1
Name: Region, Length: 1042, dtype: object

## Attribute and Method Chaining

Load the "Region.csv" file and set the *squeeze* option to *True*. Store the information in a variable called **region_data**. Preview the data with the pandas *.head()* method.

In [12]:
data = pd.read_csv('Region.csv', squeeze = True)
region_data = data.copy()
region_data.head()

0    Region 2
1    Region 6
2    Region 3
3    Region 2
4    Region 3
Name: Region, dtype: object

Use method chaining to obtain the first five rows from the values of **region_data** sorted in ascending order.

In [14]:
region_data.sort_values().head()

462    Region 1
347    Region 1
609    Region 1
610    Region 1
339    Region 1
Name: Region, dtype: object

Use method chaining to obtain the last five rows from the values of **region_data** sorted in descending order.

In [15]:
region_data.sort_values(ascending=False).tail()

339    Region 1
610    Region 1
609    Region 1
347    Region 1
462    Region 1
Name: Region, dtype: object

Execute the next code cell to create a Series object called **emp_birth_date** that contains the employee numbers and the dates of birth of certain 4 employees.

In [16]:
emp_birth_date = pd.Series({'No_0001':'1963-08-02', 'No_0002':'1964-06-13', 'No_0003':'1989-12-04', 'No_0004':'1996-04-08'})
emp_birth_date

No_0001    1963-08-02
No_0002    1964-06-13
No_0003    1989-12-04
No_0004    1996-04-08
dtype: object

Obtain just the index of **emp_birth_date**.

In [17]:
emp_birth_date.index

Index(['No_0001', 'No_0002', 'No_0003', 'No_0004'], dtype='object')

Obtain the row labels of the index, i.e. its values, stored in an array.

In [19]:
emp_birth_date.index.to_numpy()

array(['No_0001', 'No_0002', 'No_0003', 'No_0004'], dtype=object)

## .sort_index()

Load the "Region.csv" file and set the *squeeze* option to *True*. Store the information in a variable called **region_data**. Preview the data with the pandas *.head()* method.

In [20]:
data = pd.read_csv('Region.csv', squeeze = True)
region_data = data.copy()
region_data.head()

0    Region 2
1    Region 6
2    Region 3
3    Region 2
4    Region 3
Name: Region, dtype: object

Overwrite the content of the **region_data** Series as you sort the values in *descending* order. Observe the index of the Series. Is it ordered correctly?

In [21]:
region_data = region_data.sort_values(ascending=False)
region_data

940    Region 9
842    Region 9
8      Region 9
450    Region 9
536    Region 9
         ...   
339    Region 1
610    Region 1
609    Region 1
347    Region 1
462    Region 1
Name: Region, Length: 1042, dtype: object

Sort the index of the new version of the **region_data** Series object.

In [22]:
region_data.sort_index()

0       Region 2
1       Region 6
2       Region 3
3       Region 2
4       Region 3
          ...   
1037    Region 6
1038    Region 1
1039    Region 4
1040    Region 6
1041    Region 6
Name: Region, Length: 1042, dtype: object

Sort the index of the **region_data** Series in opposite direction. You can do that by setting the *ascending* parameter equal to *False*.

In [23]:
region_data.sort_index(ascending=False)

1041    Region 6
1040    Region 6
1039    Region 4
1038    Region 1
1037    Region 6
          ...   
4       Region 3
3       Region 2
2       Region 3
1       Region 6
0       Region 2
Name: Region, Length: 1042, dtype: object