<a href="https://colab.research.google.com/github/HwanKR/data-analysis-udemy-study/blob/main/pandas_series_fundamentals_skeleton.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. DEFINE A PANDAS SERIES (WITH NUMERIC DEFAULT INDEX)

In [37]:
# Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python).
# DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).
# Series Vs. DataFrame? Series is considered a single column of a DataFrame.

In [38]:
import pandas as pd

In [39]:
# Let's define a Python list that contains 5 stocks: Nvidia, Microsoft, FaceBook, Amazon, and Boeing
my_list = ['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']
my_list

['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']

In [40]:
# Let's confirm the Datatype
type(my_list)

list

In [41]:
  # Let's create a one dimensional Pandas "series"
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that series is formed of data and associated index (numeric index has been automatically generated)
# Check Pandas Documentation for More information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series
# Object datatype is used for text data (String)
series_1 = pd.Series(data = my_list)
series_1

Unnamed: 0,0
0,NVDA
1,MSFT
2,FB
3,AMZN
4,BA


In [42]:
# Let's confirm the Pandas Series Datatype
type(series_1)

In [43]:
# Let's define another Pandas Series that contains numeric values (stock prices) instead of text data
# Note that we have int64 datatype which means it's integer stored in 64 bits in memory
series_2 = pd.Series(data = [100, 200, 500, 1000, 5000])
series_2

Unnamed: 0,0
0,100
1,200
2,500
3,1000
4,5000


**MINI CHALLENGE #1:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite movies. Confirm the datatype of "my_series"**

In [44]:
my_series = pd.Series(data = ['A', 'B', 'C'])
my_series


Unnamed: 0,0
0,A
1,B
2,C


# 2. DEFINE A PANDAS SERIES WITH CUSTOM INDEX

In [45]:
# Let's define a Python list that contains 5 stocks: Nvidia, Microsoft, FaceBook, Amazon, and Boeing
my_list = ['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']
my_list

['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']

In [46]:
# Let's define a python list as shown below. This python list will be used for the Series index:
my_labels = ['stock#1', 'stock#2', 'stock#3', 'stock#4', 'stock#5']
my_labels

['stock#1', 'stock#2', 'stock#3', 'stock#4', 'stock#5']

In [47]:
# Let's create a one dimensional Pandas "series"
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels
series_3 = pd.Series(data = my_list, index = my_labels)

In [48]:
# Let's view the series
series_3

Unnamed: 0,0
stock#1,NVDA
stock#2,MSFT
stock#3,FB
stock#4,AMZN
stock#5,BA


In [49]:
# Let's obtain the datatype
type(series_3)

**MINI CHALLENGE #2:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite movies. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "movie #1", "Movie #2", and "movie #3"**

In [50]:
my_series = pd.Series(data = ['movie1', 'movie2', 'movie3'], index = ['movie #1', 'Movie #2', 'movie #3'])
my_series

Unnamed: 0,0
movie #1,movie1
Movie #2,movie2
movie #3,movie3


# 3. DEFINE A PANDAS SERIES FROM A DICTIONARY

In [51]:
# A Dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its corresponding value.
# Keys are unique within a dictionary while values may not be.
# List elements are accessed by their position in the list, via indexing while Dictionary elements are accessed via keys
# Define a dictionary named "my_dict" using key-value pairs

my_dict = {'Bank Client ID': 111,
           'Bank Client Name': 'Steve',
           'Net Worth [$]': 3500,
           'Years with Bank': 9}

In [52]:
# Show the dictionary
my_dict

{'Bank Client ID': 111,
 'Bank Client Name': 'Steve',
 'Net Worth [$]': 3500,
 'Years with Bank': 9}

In [53]:
# Confirm the dictionary datatype
type(my_dict)

dict

In [54]:
# Let's define a Pandas Series Using the dictionary
series_4 = pd.Series(my_dict)
series_4

Unnamed: 0,0
Bank Client ID,111
Bank Client Name,Steve
Net Worth [$],3500
Years with Bank,9


**MINI CHALLENGE #3:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices**

In [55]:
my_stock = {'엔비디아': 500, '마이크로소프트': 450, '삼성전자': 200}
pd.Series(my_stock)

Unnamed: 0,0
엔비디아,500
마이크로소프트,450
삼성전자,200


# 4. PANDAS ATTRIBUTES

In [56]:
# Attributes/Properties: do not use parantheses "()" and are used to get Pandas Series Properties. Ex: my_series.values, my_series.shape
# Methods: use parantheses "()" and might include arguments and they actually alter/change the Pandas Series. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()
# Indexers: use square brackets "[]" and are used to access specific elements in a Pandas Series or DataFrame. Ex: my_series.loc[], my_series.iloc[]

# Let's redefine a Pandas Series containing our favourite 5 stocks
# Nvidia, Microsoft, FaceBook, Amazon, and Boeing

my_list = ['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']
my_series = pd.Series(data = my_list)
my_series

Unnamed: 0,0
0,NVDA
1,MSFT
2,FB
3,AMZN
4,BA


In [57]:
# ".Values" attribute is used to return Series as ndarray depending on its dtype
# Check this for more information: https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html#pandas.Series.values
my_series.values

array(['NVDA', 'MSFT', 'FB', 'AMZN', 'BA'], dtype=object)

In [58]:
# index is used to return the index (axis labels) of the Series
my_series.index

RangeIndex(start=0, stop=5, step=1)

In [59]:
# dtype is used to return the datatype of the Series ('O' stands for 'object' datatype)
my_series.dtype

dtype('O')

In [60]:
# Check if all elements are unique or not
my_series.is_unique

True

In [61]:
# Check the shape of the Series
# note that a Series is one dimensional
my_series.shape

(5,)

**MINI CHALLENGE #4:**
- **What is the size of the Pandas Series? (External Research for the proper attribute is Required)**

In [62]:
my_series.size

5

# 5. PANDAS METHODS

In [63]:
# Methods have parentheses and they actually alter/change the Pandas Series
# Methods: use parantheses "()" and might include arguments. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()

# Let's define another Pandas Series that contains numeric values (stock prices) instead of text data
# Note that we have int64 datatype which means it contains integer values stored in 64 bits in memory

my_series = pd.Series(data = [100, 200, 500, 1000, 5000])
my_series

Unnamed: 0,0
0,100
1,200
2,500
3,1000
4,5000


In [64]:
# Let's obtain the sum of all elements in the Pandas Series
my_series.sum()

np.int64(6800)

In [65]:
# Let's obtain the multiplication of all elements in the Pandas Series
my_series.product()

np.int64(50000000000000)

In [66]:
# Let's obtain the average
my_series.mean()

np.float64(1360.0)

In [67]:
# Let's show the first couple of elements in the Pandas Series
my_series.head(2)

Unnamed: 0,0
0,100
1,200


In [68]:
# Note that head creates a new dataframe
new_series = my_series.head(3)
new_series

Unnamed: 0,0
0,100
1,200
2,500


**MINI CHALLENGE #5:**
- **Show the last 2 rows in the Pandas Series (External Research is Required)**
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [69]:
my_series.tail(2)
my_series.info()
my_series.memory_usage(deep=True)

<class 'pandas.core.series.Series'>
RangeIndex: 5 entries, 0 to 4
Series name: None
Non-Null Count  Dtype
--------------  -----
5 non-null      int64
dtypes: int64(1)
memory usage: 172.0 bytes


172

# 6. IMPORT CSV DATA (1-D) USING PANDAS

In [70]:
# Pandas read_csv is used to read a csv file and store data in a DataFrame by default (DataFrames will be covered shortly!)
# Use Squeeze to convert it into a Pandas Series (One-dimensional)
# Notice that no foramtting exists when a Series is plotted
import pandas as pd
sp500 = pd.read_csv('S_P500_Prices.csv')
sp500

FileNotFoundError: [Errno 2] No such file or directory: 'S_P500_Prices.csv'

In [None]:
from google.colab import drive
drive.mount('/content/drive')

**MINI CHALLENGE #6:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [None]:
type(sp500)

# 7. PANDAS BUILT-IN FUNCTIONS

In [71]:
# Pandas works great with pre-existing python functions
# You don't have to play with pandas methods and directly leverage Python functions
# Check Python built-in functions here: https://docs.python.org/3/library/functions.html
sp500 = pd.read_csv('S_P500_Prices.csv')
sp500

FileNotFoundError: [Errno 2] No such file or directory: 'S_P500_Prices.csv'

In [None]:
# Obtain the Data Type of the Pandas Series
type(sp500)

In [None]:
# Obtain the length of the Pandas Series
len(sp500)

In [None]:
# Obtain the maximum value of the Pandas Series
max(sp500['sp500'])

In [None]:
# Obtain the minimum value of the Pandas Series
min(sp500['sp500'])

**MINI CHALLENGE #7:**
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- **my_series = pd.Series(data = [-10, 100, -30, 50, 100])**


In [76]:
my_series = pd.Series(data = [-10, 100, -30, 50, 100])
print(my_series.abs())
print(set(my_series))


0     10
1    100
2     30
3     50
4    100
dtype: int64
{-30, 100, 50, -10}


# 8. SORTING PANDAS SERIES

In [None]:
# Let's import CSV data as follows:


In [None]:
# You can sort the values in the dataframe as follows


In [None]:
# Let's view Pandas Series again after sorting, Note that nothing changed in memory! you have to make sure that inplace is set to True


In [None]:
# Set inplace = True to ensure that change has taken place in memory


In [None]:
# Note that now the change (ordering) took place


In [None]:
# Notice that the indexes are now changed
# You can also sort by index (revert back to the original Pandas Series) as follows:


**MINI CHALLENGE #8:**
- **Sort the S&P500 values in a decending order instead. Make sure to update values in-memory.**

# 9. PERFORM MATH OPERATIONS ON PANDAS SERIES

In [None]:
# Let's import CSV data as follows:


In [None]:
# Apply Sum Method on Pandas Series


In [None]:
# Apply count Method on Pandas Series


In [None]:
# Obtain the maximum value


In [None]:
# Obtain the minimum value


In [None]:
# My favourite: Describe!
# Describe is used to obtain all statistical information in one place


**MINI CHALLENGE #9:**
- **Obtain the average price of the S&P500 using two different methods**

# 10. CHECK IF A GIVEN ELEMENT EXISTS IN A PANDAS SERIES

In [None]:
# Let's import CSV data as follows:


In [None]:
# Check if a given number exists in a Pandas Series values
# Returns a boolean "True" or "False"


In [None]:
# Check if a given number exists in a Pandas Series index


In [None]:
# Note that by default 'in' will search in Pandas index and not values


**MINI CHALLENGE #10:**
- **Check if the stock price 3349 exists in the sp500 Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

# 11. INDEXING: OBTAIN SPECIFIC ELEMENTS FROM PANDAS SERIES

In [None]:
# Let's import CSV data as follows:


In [None]:
# Obtain the first element in a Pandas Series
# Note that first element has an index 0


In [None]:
# Obtain the last element in the Pandas Series


**MINI CHALLENGE #11:**
- **Obtain the fifth element in the Pandas Series**

# 12. SLICING: OBTAIN MULTIPLE ELEMENTS FROM PANDAS SERIES

In [None]:
# Let's import CSV data as follows:


In [None]:
# Slice elements from a Pandas Series
# Let's obtain elements starting from index 0 up until and not including index 5 (ie: indexes 0-4)


In [None]:
# obtain all elements starting from index 0 up until and not including index 10


In [None]:
# obtain all elements starting from index 5 up until the end of the Pandas Series


**MINI CHALLENGE #12:**
- **Obtain all elements in Pandas Series except for the last 3 elements**

# EXCELLENT JOB!

# MINI CHALLENGE SOLUTIONS

**MINI CHALLENGE #1 SOLUTION:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite movies. Confirm the datatype of "my_series"**

In [None]:
# Let's define a Python list that contains 3 top movies
my_list = ['The Godfather','Star Wars','The Wolf of Wall Street']
my_series = pd.Series(data = my_list)
my_series

In [None]:
type(my_series)

**MINI CHALLENGE #2 SOLUTION:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite movies. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "movie #1", "Movie #2", and "movie #3"**

In [None]:
# Let's define a Python list that contains 3 movies as follows
my_list = ['The Godfather','Star Wars','The Wolf of Wall Street']

# Let's define a python list as shown below. This python list will be used for the Series index:
my_labels = ['movie #1', 'movie #2', 'movie #3']


In [None]:
# Let's create a one dimensional Pandas "series"
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels
my_series = pd.Series(data = my_list, index = my_labels)
my_series

**MINI CHALLENGE #3 SOLUTION:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices**


In [None]:
stocks = {'SP500': 3000,
          'AAPL': 400,
          'TSLA': 2200}
print(stocks)

In [None]:
# Let's define a Pandas Series Using the dictionary
my_series = pd.Series(stocks)
my_series

**MINI CHALLENGE #4 SOLUTION:**
- **What is the size of the Pandas Series? (External Research is Required)**

In [None]:
# size is used to return the size of the series
series_3.size

**MINI CHALLENGE #5 SOLUTION:**
- **Show the last 2 rows in the Pandas Series (External Research is Required)**
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [None]:
my_series.tail(2)

In [None]:
my_series.memory_usage()

**MINI CHALLENGE #6 SOLUTION:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [None]:
sp500 = pd.read_csv('S&P500_Prices.csv', squeeze = False)
# Note that when you set Squeeze = False, the data is stored in a DataFrame by default.
# DataFrame is simply used to store multi dimensional data as compares to Pandas Series that only holds 1-D dataset
# Note that DataFrames has proper formatting when you attempt to view them as shown below
# Note that Pandas Series has no formatting


In [None]:
sp500

In [None]:
type(sp500)

In [None]:
sp500 = pd.read_csv('S&P500_Prices.csv', squeeze = True)
type(sp500)

**MINI CHALLENGE #7 SOLUTION:**
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- **my_series = pd.Series(data = [-10, 100, -30, 50, 100])**


In [None]:
my_series = pd.Series(data = [-10, 100, -30, 50, 100])
my_series

In [None]:
abs(my_series)

In [None]:
set(my_series)

**MINI CHALLENGE #8 SOLUTION:**
- **Sort the S&P500 values in a decending order instead. Make sure to update values in-memory.**

In [None]:
sp500.sort_values(ascending = False, inplace = True)
sp500

**MINI CHALLENGE #9 SOLUTION:**
- **Obtain the average price using two different methods**

In [None]:
# Obtain the average - Solution #1
sp500.sum()/sp500.count()

In [None]:
# Obtain the average - Solution #s
sp500.mean()

**MINI CHALLENGE #10 SOLUTION:**
- **Check if the stock price 3349 exists in the sp500 Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

In [None]:
3349 in sp500.values

In [None]:
sp500 = round(sp500)
sp500

In [None]:
3349 in sp500.values

**MINI CHALLENGE #11 SOLUTION:**
- **Obtain the fifth element in the Pandas Series**

In [None]:
# Note that the fifth element has an index = 4
sp500[4]

**MINI CHALLENGE #12 SOLUTION:**
- **Obtain all elements in Pandas Series except for the last 3 elements**

In [None]:
sp500[:-3]