# TASK #1. DEFINE A PANDAS SERIES

In [2]:
# Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python). 
# DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).
# Series Vs. DataFrame? Series is considered a single column of a DataFrame.

import pandas as pd

In [3]:
# Let's define a Python list that contains 5 crypto currencies 
cryptos = ['BTC', 'XRP', 'LTC', 'ADA', 'ETH']
cryptos

['BTC', 'XRP', 'LTC', 'ADA', 'ETH']

In [4]:
# Let's confirm the Datatype
type(cryptos)

list

In [6]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that series is formed of data and associated index (numeric index has been automatically generated) 
# Check Pandas Documentation for More information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series
# Object datatype is used for text data (String)
cryptos_series = pd.Series(data = cryptos)
cryptos_series

0    BTC
1    XRP
2    LTC
3    ADA
4    ETH
dtype: object

In [7]:
# Let's confirm the Pandas Series Datatype
type(cryptos_series)

pandas.core.series.Series

In [8]:
# Let's define another Pandas Series that contains numeric values (crypto prices) instead of text data
# Note that we have int64 datatype which means it's integer stored in 64 bits in memory
crypto_prices_series = pd.Series(data = [200,500,2000,20,50])
crypto_prices_series

0     200
1     500
2    2000
3      20
4      50
dtype: int64

In [9]:
type(crypto_prices_series)

pandas.core.series.Series

**MINI CHALLENGE #1:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Confirm the datatype of "my_series"**

# TASK #2. DEFINE A PANDAS SERIES WITH CUSTOM INDEX

In [7]:
# Let's define a Python list that contains 5 Crypto currencies


['BTC', 'XRP', 'LTC', 'ADA', 'ETH']

In [10]:
# Let's define a python list as shown below. This python list will be used for the Series index:
index = ['crypto#1', 'crypto#2', 'crypto#3', 'crypto#4', 'crypto#5']

In [11]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels 
cryptos_series = pd.Series(data = cryptos, index = index)

In [12]:
# Let's view the series
cryptos_series

crypto#1    BTC
crypto#2    XRP
crypto#3    LTC
crypto#4    ADA
crypto#5    ETH
dtype: object

In [13]:
# Let's obtain the datatype
type(cryptos_series)

pandas.core.series.Series

**MINI CHALLENGE #2:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "stock #1", "stock #2", and "stock #3"**

# TASK #3. DEFINE A PANDAS SERIES FROM A DICTIONARY

In [12]:
# A Dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its corresponding value.
# Keys are unique within a dictionary while values may not be. 
# List elements are accessed by their position in the list, via indexing while Dictionary elements are accessed via keys
# Define a dictionary named "my_dict" using key-value pairs


In [14]:
# Show the dictionary
my_dict = {'Employee ID': 1,
 'Employee Name': 'Steve',
 'Salary [$]': 2000,
 'Years with company': 10}
my_dict

{'Employee ID': 1,
 'Employee Name': 'Steve',
 'Salary [$]': 2000,
 'Years with company': 10}

In [15]:
# Confirm the dictionary datatype 
type(my_dict)

dict

In [16]:
# Let's define a Pandas Series Using the dictionary
employee_series = pd.Series(my_dict)
employee_series

Employee ID               1
Employee Name         Steve
Salary [$]             2000
Years with company       10
dtype: object

**MINI CHALLENGE #3:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices** 

# TASK #4. PANDAS ATTRIBUTES

In [19]:
# Attributes/Properties: do not use parantheses "()" and are used to get Pandas Series Properties. Ex: my_series.values, my_series.shape
# Methods: use parantheses "()" and might include arguments and they actually alter/change the Pandas Series. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()
# Indexers: use square brackets "[]" and are used to access specific elements in a Pandas Series or DataFrame. Ex: my_series.loc[], my_series.iloc[]

# Let's redefine a Pandas Series containing our favourite 5 cryptos 
cryptos_series

crypto#1    BTC
crypto#2    XRP
crypto#3    LTC
crypto#4    ADA
crypto#5    ETH
dtype: object

In [21]:
# ".Values" attribute is used to return Series as ndarray depending on its dtype
# Check this for more information: https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html#pandas.Series.values
cryptos_series.values

array(['BTC', 'XRP', 'LTC', 'ADA', 'ETH'], dtype=object)

In [22]:
# index is used to return the index (axis labels) of the Series
cryptos_series.index

Index(['crypto#1', 'crypto#2', 'crypto#3', 'crypto#4', 'crypto#5'], dtype='object')

In [25]:
# dtype is used to return the datatype of the Series ('O' stands for 'object' datatype)
cryptos_series.dtype

dtype('O')

In [26]:
# Check if all elements are unique or not
cryptos_series.is_unique

True

In [27]:
# Check the shape of the Series
# note that a Series is one dimensional
cryptos_series.shape

(5,)

**MINI CHALLENGE #4:** 
- **What is the size of the Pandas Series? (External Research for the proper attribute is Required)**

In [28]:
cryptos_series.size

5

# TASK #5. PANDAS METHODS

In [29]:
# Methods have parentheses and they actually alter/change the Pandas Series
# Methods: use parantheses "()" and might include arguments. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()

# Let's define another Pandas Series that contains numeric values (crypto prices) instead of text data
# Note that we have int64 datatype which means it contains integer values stored in 64 bits in memory
crypto_prices_series

0     200
1     500
2    2000
3      20
4      50
dtype: int64

In [30]:
# Let's obtain the sum of all elements in the Pandas Series
crypto_prices_series.sum()

2770

In [31]:
# Let's obtain the multiplication of all elements in the Pandas Series
crypto_prices_series.product()

200000000000

In [32]:
# Let's obtain the average
crypto_prices_series.mean()

554.0

In [33]:
# Let's show the first couple of elements in the Pandas Series
crypto_prices_series.head(2)

0    200
1    500
dtype: int64

In [27]:
# Note that head creates a new dataframe 


0     400
1     500
2    1500
dtype: int64

**MINI CHALLENGE #5:** 
- **Show the last 2 rows in the Pandas Series (External Research is Required)** 
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [34]:
crypto_prices_series.tail(2)

3    20
4    50
dtype: int64

In [39]:
crypto_prices_series.memory_usage()

168

# TASK #6. IMPORT CSV DATA (1-D) USING PANDAS

In [8]:
# Pandas read_csv is used to read a csv file and store data in a DataFrame by default (DataFrames will be covered shortly!)
# Use Squeeze to convert it into a Pandas Series (One-dimensional)
# Notice that no foramtting exists when a Series is plotted
BTC_price_series = pd.read_csv('crypto.csv', squeeze = True)

In [9]:
BTC_price_series

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

**MINI CHALLENGE #6:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [10]:
type(BTC_price_series)

pandas.core.series.Series

# TASK #7. PANDAS BUILT-IN FUNCTIONS

In [30]:
# Pandas works great with pre-existing python functions 
# You don't have to play with pandas methods and directly leverage Python functions
# Check Python built-in functions here: https://docs.python.org/3/library/functions.html


In [31]:
# Obtain the Data Type of the Pandas Series


pandas.core.series.Series

In [15]:
# Obtain the length of the Pandas Series
BTC_price_series.count()

2385

In [16]:
# Obtain the maximum value of the Pandas Series
BTC_price_series.max()

61243.08594

In [17]:
# Obtain the minimum value of the Pandas Series
BTC_price_series.min()

178.1029968

**MINI CHALLENGE #7:**
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- my_series = pd.Series(data = [-10, 100, -30, 50, 100])


In [20]:
BTC_price_series.mul(-1)

0        -457.334015
1        -424.440002
2        -394.795990
3        -408.903992
4        -398.821014
            ...     
2380   -55950.746090
2381   -57750.199220
2382   -58917.691410
2383   -58918.832030
2384   -59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [22]:
set(BTC_price_series)

{8192.150391,
 8192.494141,
 8197.689453,
 8200.639648,
 8205.167969,
 8205.939453,
 8205.369141,
 8208.995117,
 8209.400391,
 8206.145508,
 32782.02344,
 16408.19922,
 8218.459961,
 8222.078125,
 8223.679688,
 8228.783203,
 8230.923828,
 49199.87109,
 8243.720703,
 8245.915039,
 8245.623047,
 8247.179688,
 8250.969727,
 8251.845703,
 8253.549805,
 8253.69043,
 8259.992188,
 8265.589844,
 8269.80957,
 8277.009766,
 24664.79102,
 16477.59961,
 8293.868164,
 8294.30957,
 8300.860352,
 8309.286133,
 8319.472656,
 8321.756836,
 8321.005859,
 8329.110352,
 8336.555664,
 8338.349609,
 8343.276367,
 8367.847656,
 8368.830078,
 178.1029968,
 57523.421879999994,
 16564.0,
 8374.686523,
 16569.40039,
 57539.94531,
 199.2599945,
 8393.041992,
 208.0970001,
 209.8439941,
 210.3390045,
 211.31500240000003,
 211.07899480000003,
 210.49499509999998,
 214.86099240000001,
 8406.515625,
 217.4640045,
 217.11099240000001,
 219.1589966,
 219.42999269999996,
 221.76400759999999,
 222.8820038,
 222.60000609

In [21]:
BTC_price_series.drop_duplicates()

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2382, dtype: float64

# TASK #8. SORTING PANDAS SERIES

In [35]:
# Let's import CSV data as follows:


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [23]:
# You can sort the values in the dataframe as follows
BTC_price_series.sort_values()

119       178.102997
122       199.259995
121       208.097000
120       209.843994
123       210.339004
            ...     
2382    58917.691410
2383    58918.832030
2384    59095.808590
2366    59302.316410
2365    61243.085940
Name: BTC-USD Price, Length: 2385, dtype: float64

In [37]:
# Let's view Pandas Series again after sorting, Note that nothing changed in memory! you have to make sure that inplace is set to True


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [24]:
# Set inplace = True to ensure that change has taken place in memory 
BTC_price_series.sort_values(inplace=True)

In [25]:
# Note that now the change (ordering) took place 
BTC_price_series

119       178.102997
122       199.259995
121       208.097000
120       209.843994
123       210.339004
            ...     
2382    58917.691410
2383    58918.832030
2384    59095.808590
2366    59302.316410
2365    61243.085940
Name: BTC-USD Price, Length: 2385, dtype: float64

In [26]:
# Notice that the indexes are now changed 
# You can also sort by index (revert back to the original Pandas Series) as follows: 
BTC_price_series.sort_index(inplace=True)

**MINI CHALLENGE #8:**
- **Sort the BTC_price_series values in a decending order instead. Make sure to update values in-memory.**

In [27]:
BTC_price_series.sort_values(ascending=False)

2365    61243.085940
2366    59302.316410
2384    59095.808590
2383    58918.832030
2382    58917.691410
            ...     
123       210.339004
120       209.843994
121       208.097000
122       199.259995
119       178.102997
Name: BTC-USD Price, Length: 2385, dtype: float64

# TASK #9. PERFORM MATH OPERATIONS ON PANDAS SERIES

In [41]:
# Let's import CSV data as follows:


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [28]:
# Apply Sum Method on Pandas Series
BTC_price_series.sum()

15435379.738852698

In [29]:
# Apply count Method on Pandas Series
BTC_price_series.count()

2385

In [30]:
# Obtain the maximum value
BTC_price_series.max()

61243.08594

In [31]:
# Obtain the minimum value
BTC_price_series.min()

178.1029968

In [32]:
# My favourite: Describe! 
# Describe is used to obtain all statistical information in one place 
BTC_price_series.describe()

count     2385.000000
mean      6471.857333
std       9289.022505
min        178.102997
25%        454.618988
50%       4076.632568
75%       8864.766602
max      61243.085940
Name: BTC-USD Price, dtype: float64

**MINI CHALLENGE #9:**
- **Obtain the average price of the BTC_price_series using two different methods**

In [33]:
BTC_price_series.mean()

6471.857332852284

# TASK #10. CHECK IF A GIVEN ELEMENT EXISTS IN A PANDAS SERIES

In [47]:
# Let's import CSV data as follows:


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [34]:
# Check if a given number exists in a Pandas Series values
# Returns a boolean "True" or "False"
1295.5 in BTC_price_series

False

In [35]:
# Check if a given number exists in a Pandas Series index
1295 in BTC_price_series.index

True

In [50]:
# Note that by default 'in' will search in Pandas index and not values


True

**MINI CHALLENGE #10:**
- **Check if the stock price 399 exists in the BTC_price_series Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

In [36]:
399 in BTC_price_series.values

False

In [38]:
399 in BTC_price_series.values.round()

True

# EXCELLENT JOB!

# MINI CHALLENGE SOLUTIONS

**MINI CHALLENGE #1 SOLUTION:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Confirm the datatype of "my_series"**

In [51]:
# Let's define a Python list that contains 3 top stocks
my_list = ['Facebook','Apple','Nvidia'] 
my_series = pd.Series(data = my_list) 
my_series

0    Facebook
1       Apple
2      Nvidia
dtype: object

In [52]:
type(my_series)

pandas.core.series.Series

**MINI CHALLENGE #2 SOLUTION:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "stock #1", "stock #2", and "stock #3"**

In [53]:
# Let's define a Python list that contains 3 stocks as follows
my_list = ['Facebook','Apple','Nvidia'] 

# Let's define a python list as shown below. This python list will be used for the Series index:
my_labels = ['stock #1', 'stock #2', 'stock #3']


In [54]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels 
my_series = pd.Series(data = my_list, index = my_labels)
my_series

stock #1    Facebook
stock #2       Apple
stock #3      Nvidia
dtype: object

**MINI CHALLENGE #3 SOLUTION:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices** 


In [55]:
stocks = {'Facebook': 3000, 
          'Apple'   : 400,
          'Nvidia'  : 2200}
print(stocks)

{'Facebook': 3000, 'Apple': 400, 'Nvidia': 2200}


In [56]:
# Let's define a Pandas Series Using the dictionary
my_series = pd.Series(stocks)
my_series

Facebook    3000
Apple        400
Nvidia      2200
dtype: int64

**MINI CHALLENGE #4 SOLUTION:** 
- **What is the size of the Pandas Series? (External Research is Required)**

In [57]:
# size is used to return the size of the series
crypto_series.size

5

**MINI CHALLENGE #5 SOLUTION:** 
- **Show the last 2 rows in the Pandas Series (External Research is Required)** 
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [58]:
crypto_prices.tail(2)

3    20
4    70
dtype: int64

In [59]:
crypto_prices.memory_usage()

168

**MINI CHALLENGE #6 SOLUTION:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [60]:
BTC_price_series = pd.read_csv('crypto.csv', squeeze = False)
# Note that when you set Squeeze = False, the data is stored in a DataFrame by default. 
# DataFrame is simply used to store multi dimensional data as compares to Pandas Series that only holds 1-D dataset 
# Note that DataFrames has proper formatting when you attempt to view them as shown below 
# Note that Pandas Series has no formatting


In [61]:
BTC_price_series

Unnamed: 0,BTC-USD Price
0,457.334015
1,424.440002
2,394.795990
3,408.903992
4,398.821014
...,...
2380,55950.746090
2381,57750.199220
2382,58917.691410
2383,58918.832030


In [62]:
type(BTC_price_series)

pandas.core.frame.DataFrame

In [63]:
BTC_price_series = pd.read_csv('crypto.csv', squeeze = True)
type(BTC_price_series)

pandas.core.series.Series

**MINI CHALLENGE #7 SOLUTION:**
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- my_series = pd.Series(data = [-10, 100, -30, 50, 100])


In [64]:
my_series = pd.Series(data = [-10, 100, -30, 50, 100])
my_series

0    -10
1    100
2    -30
3     50
4    100
dtype: int64

In [65]:
abs(my_series)

0     10
1    100
2     30
3     50
4    100
dtype: int64

In [66]:
set(my_series)

{-30, -10, 50, 100}

**MINI CHALLENGE #8 SOLUTION:**
- **Sort the BTC_price_series values in a decending order instead. Make sure to update values in-memory.**

In [67]:
BTC_price_series.sort_values(ascending = False, inplace = True) 
BTC_price_series

2365    61243.085940
2366    59302.316410
2384    59095.808590
2383    58918.832030
2382    58917.691410
            ...     
123       210.339004
120       209.843994
121       208.097000
122       199.259995
119       178.102997
Name: BTC-USD Price, Length: 2385, dtype: float64

**MINI CHALLENGE #9 SOLUTION:**
- **Obtain the average price using two different methods**

In [68]:
# Obtain the average - Solution #1
BTC_price_series.sum()/BTC_price_series.count()

6471.857332852285

In [69]:
# Obtain the average - Solution #s
BTC_price_series.mean()

6471.857332852285

**MINI CHALLENGE #10 SOLUTION:**
- **Check if the stock price 399 exists in the BTC_price_series Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

In [70]:
399 in BTC_price_series.values

False

In [71]:
prices_series = round(BTC_price_series)
prices_series

2365    61243.0
2366    59302.0
2384    59096.0
2383    58919.0
2382    58918.0
         ...   
123       210.0
120       210.0
121       208.0
122       199.0
119       178.0
Name: BTC-USD Price, Length: 2385, dtype: float64

In [72]:
399 in prices_series.values

True

In [3]:
import pandas as pd

In [4]:
crypto_list = ['BTC','XRP','LTC', 'ADA', 'ETH'] 
crypto_series = pd.Series(data = crypto_list)
crypto_series.dtype

dtype('O')

In [5]:
crypto_prices = pd.Series(data = [400, 500, 1500, 20, 70])
crypto_prices.mean()

498.0

In [6]:
my_series = pd.Series(data = [-100, 100, -300, 50, 100])
abs(my_series)

0    100
1    100
2    300
3     50
4    100
dtype: int64