# TASK #1. DEFINE A PANDAS SERIES

In [1]:
# Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python). 
# DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).
# Series Vs. DataFrame? Series is considered a single column of a DataFrame.

import numpy as np
import pandas as pd

In [2]:
# Let's define a Python list that contains 5 crypto currencies 
crypto = ["BTC"
,"XRP"
,"LTC"
,"ADA"
,"ETH"]

In [3]:
# Let's confirm the Datatype
type(crypto)

list

In [4]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that series is formed of data and associated index (numeric index has been automatically generated) 
# Check Pandas Documentation for More information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series
# Object datatype is used for text data (String)
crypto = ["BTC"
,"XRP"
,"LTC"
,"ADA"
,"ETH"]
crypto = pd.Series(crypto)
crypto

0    BTC
1    XRP
2    LTC
3    ADA
4    ETH
dtype: object

In [5]:
# Let's confirm the Pandas Series Datatype
type(crypto)

pandas.core.series.Series

In [6]:
# Let's define another Pandas Series that contains numeric values (crypto prices) instead of text data
# Note that we have int64 datatype which means it's integer stored in 64 bits in memory
prices = pd.Series([2000, 500, 2000, 20, 50])
prices


0    2000
1     500
2    2000
3      20
4      50
dtype: int64

**MINI CHALLENGE #1:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Confirm the datatype of "my_series"**

In [7]:
my_series = pd.Series(['Appl', 'Tsla', 'Mcsf'])
type(my_series)

pandas.core.series.Series

# TASK #2. DEFINE A PANDAS SERIES WITH CUSTOM INDEX

In [8]:
# Let's define a Python list that contains 5 Crypto currencies
crypto_list = ['BTC', 'XRP', 'LTC', 'ADA', 'ETH']


In [9]:
# Let's define a python list as shown below. This python list will be used for the Series index:
crypto_index_list = ['crypto#1', 'crypto#2', 'crypto#3', 'crypto#4', 'crypto#5']


In [10]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels 
crypto_series = pd.Series(crypto_list, index=crypto_index_list)


In [11]:
# Let's view the series
crypto_series

crypto#1    BTC
crypto#2    XRP
crypto#3    LTC
crypto#4    ADA
crypto#5    ETH
dtype: object

In [12]:
# Let's obtain the datatype
type(crypto_series)

pandas.core.series.Series

**MINI CHALLENGE #2:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "stock #1", "stock #2", and "stock #3"**

In [13]:
my_series_index = ["stock #1", "stock #2", "stock #3"]
# my_series = pd.Series(['Appl', 'Tsla', 'Mcsf'], index=my_series_index)
my_series.index = my_series_index
my_series


stock #1    Appl
stock #2    Tsla
stock #3    Mcsf
dtype: object

# TASK #3. DEFINE A PANDAS SERIES FROM A DICTIONARY

In [14]:
## A Dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its corresponding value.
## Keys are unique within a dictionary while values may not be. 
## List elements are accessed by their position in the list, via indexing while Dictionary elements are accessed via keys
### Define a dictionary named "my_dict" using key-value pairs


In [15]:
# Show the dictionary
my_dict = {'Employee ID': 1,
 'Employee Name': 'Steve',
 'Salary [$]': 2000,
 'Years with company': 10}



In [16]:
# Confirm the dictionary datatype 
type(my_dict)

dict

In [17]:
# Let's define a Pandas Series Using the dictionary
serr_dict = pd.Series(my_dict)
serr_dict

Employee ID               1
Employee Name         Steve
Salary [$]             2000
Years with company       10
dtype: object

**MINI CHALLENGE #3:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices** 

In [18]:
crypto_list = ['BTC', 'XRP', 'LTC', 'ADA', 'ETH']
prices = [2000, 500, 2000, 20, 50]

pairs = zip(crypto_list, prices)
prices_dict = {k: v for k, v in pairs}
prices_serr = pd.Series(prices_dict)
prices_serr



BTC    2000
XRP     500
LTC    2000
ADA      20
ETH      50
dtype: int64

# TASK #4. PANDAS ATTRIBUTES

In [19]:
# Attributes/Properties: do not use parantheses "()" and are used to get Pandas Series Properties. Ex: my_series.values, my_series.shape
# Methods: use parantheses "()" and might include arguments and they actually alter/change the Pandas Series. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()
# Indexers: use square brackets "[]" and are used to access specific elements in a Pandas Series or DataFrame. Ex: my_series.loc[], my_series.iloc[]

# Let's redefine a Pandas Series containing our favourite 5 cryptos 
crypto = ["BTC"
,"XRP"
,"LTC"
,"ADA"
,"ETH"]
crypto = pd.Series(crypto)
crypto

0    BTC
1    XRP
2    LTC
3    ADA
4    ETH
dtype: object

In [20]:
# ".Values" attribute is used to return Series as ndarray depending on its dtype
# Check this for more information: https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html#pandas.Series.values
crypto.values

array(['BTC', 'XRP', 'LTC', 'ADA', 'ETH'], dtype=object)

In [21]:
# index is used to return the index (axis labels) of the Series
crypto.index

RangeIndex(start=0, stop=5, step=1)

In [22]:
# dtype is used to return the datatype of the Series ('O' stands for 'object' datatype)
crypto.dtype # this is to see the data type of the Series. 

dtype('O')

In [23]:
# Check if all elements are unique or not
crypto.is_unique # bring True if all elements are unique.

True

In [24]:
# Check the shape of the Series
# note that a Series is one dimensional
crypto.shape

(5,)

**MINI CHALLENGE #4:** 
- **What is the size of the Pandas Series? (External Research for the proper attribute is Required)**

In [25]:
crypto.size

5

# TASK #5. PANDAS METHODS

In [26]:
# Methods have parentheses and they actually alter/change the Pandas Series
# Methods: use parantheses "()" and might include arguments. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()

# Let's define another Pandas Series that contains numeric values (crypto prices) instead of text data
# Note that we have int64 datatype which means it contains integer values stored in 64 bits in memory
prices = pd.Series([2000, 500, 2000, 20, 50])

In [27]:
# Let's obtain the sum of all elements in the Pandas Series
prices.sum()

4570

In [28]:
# Let's obtain the multiplication of all elements in the Pandas Series
prices.prod()

2000000000000

In [29]:
# Let's obtain the average
prices.median()

500.0

In [30]:
# Let's show the first couple of elements in the Pandas Series
prices.head(2)

0    2000
1     500
dtype: int64

In [31]:
# Note that head creates a new dataframe 
prices.head(3)

0    2000
1     500
2    2000
dtype: int64

**MINI CHALLENGE #5:** 
- **Show the last 2 rows in the Pandas Series (External Research is Required)** 
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [32]:
print(prices.tail(2))
print(prices.memory_usage())

3    20
4    50
dtype: int64
168


# TASK #6. IMPORT CSV DATA (1-D) USING PANDAS

In [33]:
# Pandas read_csv is used to read a csv file and store data in a DataFrame by default (DataFrames will be covered shortly!)
# Use Squeeze to convert it into a Pandas Series (One-dimensional)
# Notice that no foramtting exists when a Series is plotted


In [34]:
crypto = pd.read_csv('crypto.csv', squeeze=True)
print(crypto)
type(crypto) # this is now a series

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64




  crypto = pd.read_csv('crypto.csv', squeeze=True)


pandas.core.series.Series

**MINI CHALLENGE #6:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [35]:
crypto = pd.read_csv('crypto.csv', squeeze=False) # this is now a data frame.
print(crypto)
type(crypto) # this is now a data frame

      BTC-USD Price
0        457.334015
1        424.440002
2        394.795990
3        408.903992
4        398.821014
...             ...
2380   55950.746090
2381   57750.199220
2382   58917.691410
2383   58918.832030
2384   59095.808590

[2385 rows x 1 columns]




  crypto = pd.read_csv('crypto.csv', squeeze=False) # this is now a data frame.


pandas.core.frame.DataFrame

# TASK #7. PANDAS BUILT-IN FUNCTIONS

In [36]:
# Pandas works great with pre-existing python functions 
# You don't have to play with pandas methods and directly leverage Python functions
# Check Python built-in functions here: https://docs.python.org/3/library/functions.html
import builtins

# Get a list of all available functions in Python
all_functions = dir(builtins)

# Print the list of functions
print(all_functions)



In [37]:
# Obtain the Data Type of the Pandas Series
crypto = pd.read_csv('crypto.csv', squeeze=True)
print(crypto)
type(crypto) # this is now a series


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64




  crypto = pd.read_csv('crypto.csv', squeeze=True)


pandas.core.series.Series

In [38]:
# Obtain the length of the Pandas Series
crypto.count()

2385

In [39]:
# Obtain the maximum value of the Pandas Series
crypto.max()

61243.08594

In [40]:
# Obtain the minimum value of the Pandas Series
crypto.min()

178.1029968

**MINI CHALLENGE #7:**
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- my_series = pd.Series(data = [-10, 100, -30, 50, 100])


In [41]:
my_series = my_series.where(my_series<0, my_series*-1)
my_series

TypeError: '<' not supported between instances of 'str' and 'int'

In [None]:
my_series.unique()
set(my_series)

{-100, -50, -30, -10}

# TASK #8. SORTING PANDAS SERIES

In [None]:
# Let's import CSV data as follows:
crypto = pd.read_csv('crypto.csv', squeeze=True)
print(crypto)
type(crypto) # this is now a series

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64




  crypto = pd.read_csv('crypto.csv', squeeze=True)


pandas.core.series.Series

In [None]:
# You can sort the values in the dataframe as follows
crypto.sort_values(ascending=True, axis=0, inplace=True) # we use the inplace keyword arguement to save the change.

119       178.102997
122       199.259995
121       208.097000
120       209.843994
123       210.339004
            ...     
2382    58917.691410
2383    58918.832030
2384    59095.808590
2366    59302.316410
2365    61243.085940
Name: BTC-USD Price, Length: 2385, dtype: float64

In [None]:
# Let's view Pandas Series again after sorting, Note that nothing changed in memory! you have to make sure that inplace is set to True
crypto

119       178.102997
122       199.259995
121       208.097000
120       209.843994
123       210.339004
            ...     
2382    58917.691410
2383    58918.832030
2384    59095.808590
2366    59302.316410
2365    61243.085940
Name: BTC-USD Price, Length: 2385, dtype: float64

In [None]:
# Set inplace = True to ensure that change has taken place in memory 
crypto.sort_values(ascending=True, axis=0, inplace=True) # we use the inplace keyword arguement to save the change.

In [None]:
# Note that now the change (ordering) took place 
crypto.info

<bound method Series.info of 119       178.102997
122       199.259995
121       208.097000
120       209.843994
123       210.339004
            ...     
2382    58917.691410
2383    58918.832030
2384    59095.808590
2366    59302.316410
2365    61243.085940
Name: BTC-USD Price, Length: 2385, dtype: float64>

In [None]:
# Notice that the indexes are now changed 
# You can also sort by index (revert back to the original Pandas Series) as follows: 
crypto.sort_index(ascending=True, axis=0, inplace=False) # we use the inplace keyword arguement to save the change.

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

**MINI CHALLENGE #8:**
- **Sort the BTC_price_series values in a decending order instead. Make sure to update values in-memory.**

In [None]:
crypto.sort_values(ascending=False, axis=0, inplace=True)
crypto

2365    61243.085940
2366    59302.316410
2384    59095.808590
2383    58918.832030
2382    58917.691410
            ...     
123       210.339004
120       209.843994
121       208.097000
122       199.259995
119       178.102997
Name: BTC-USD Price, Length: 2385, dtype: float64

# TASK #9. PERFORM MATH OPERATIONS ON PANDAS SERIES

In [None]:
# Let's import CSV data as follows:
crypto = pd.read_csv('crypto.csv', squeeze=True)
print(crypto)
type(crypto) # this is now a series

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64




  crypto = pd.read_csv('crypto.csv', squeeze=True)


pandas.core.series.Series

In [None]:
# Apply Sum Method on Pandas Series
crypto.sum()

15435379.738852698

In [None]:
# Apply count Method on Pandas Series
crypto.count()

2385

In [None]:
# Obtain the maximum value
crypto.max()

61243.08594

In [None]:
# Obtain the minimum value
crypto.min()

178.1029968

In [None]:
# My favourite: Describe! 
# Describe is used to obtain all statistical information in one place 
crypto.describe()

count     2385.000000
mean      6471.857333
std       9289.022505
min        178.102997
25%        454.618988
50%       4076.632568
75%       8864.766602
max      61243.085940
Name: BTC-USD Price, dtype: float64

**MINI CHALLENGE #9:**
- **Obtain the average price of the BTC_price_series using two different methods**

In [None]:
print(crypto.mean(), np.mean(crypto))

6471.857332852284 6471.857332852284


# TASK #10. CHECK IF A GIVEN ELEMENT EXISTS IN A PANDAS SERIES

In [None]:
# Let's import CSV data as follows:
crypto = pd.read_csv('crypto.csv', squeeze=True)
print(crypto)
type(crypto) # this is now a series

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64




  crypto = pd.read_csv('crypto.csv', squeeze=True)


pandas.core.series.Series

In [44]:
# Check if a given number exists in a Pandas Series values
# Returns a boolean "True" or "False"
crypto.iloc[1] == 1

False

In [49]:
# Check if a given number exists in a Pandas Series index
crypto.iloc[0] == 457.3340149

True

In [52]:
# Note that by default 'in' will search in Pandas index and not values
15 in crypto

True

**MINI CHALLENGE #10:**
- **Check if the stock price 399 exists in the BTC_price_series Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

In [66]:
crypto.isin([399]).any() # any method is used to check if 



False

In [76]:
np.ceil(crypto).astype(int).isin([399]).any() # 
# first round all the values to the nearest integer with .ceil(crypto)
# second turn the data type to integer using .astype(dtype)
# thirds check if the 399 is in the new series with isin([list of numbers])
# fourth check to see if any value is resulting boolean true. 

True

# EXCELLENT JOB!

# MINI CHALLENGE SOLUTIONS

**MINI CHALLENGE #1 SOLUTION:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Confirm the datatype of "my_series"**

In [51]:
# Let's define a Python list that contains 3 top stocks
my_list = ['Facebook','Apple','Nvidia'] 
my_series = pd.Series(data = my_list) 
my_series

0    Facebook
1       Apple
2      Nvidia
dtype: object

In [52]:
type(my_series)

pandas.core.series.Series

**MINI CHALLENGE #2 SOLUTION:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "stock #1", "stock #2", and "stock #3"**

In [53]:
# Let's define a Python list that contains 3 stocks as follows
my_list = ['Facebook','Apple','Nvidia'] 

# Let's define a python list as shown below. This python list will be used for the Series index:
my_labels = ['stock #1', 'stock #2', 'stock #3']


In [54]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels 
my_series = pd.Series(data = my_list, index = my_labels)
my_series

stock #1    Facebook
stock #2       Apple
stock #3      Nvidia
dtype: object

**MINI CHALLENGE #3 SOLUTION:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices** 


In [55]:
stocks = {'Facebook': 3000, 
          'Apple'   : 400,
          'Nvidia'  : 2200}
print(stocks)

{'Facebook': 3000, 'Apple': 400, 'Nvidia': 2200}


In [56]:
# Let's define a Pandas Series Using the dictionary
my_series = pd.Series(stocks)
my_series

Facebook    3000
Apple        400
Nvidia      2200
dtype: int64

**MINI CHALLENGE #4 SOLUTION:** 
- **What is the size of the Pandas Series? (External Research is Required)**

In [57]:
# size is used to return the size of the series
crypto_series.size

5

**MINI CHALLENGE #5 SOLUTION:** 
- **Show the last 2 rows in the Pandas Series (External Research is Required)** 
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [58]:
crypto_prices.tail(2)

3    20
4    70
dtype: int64

In [59]:
crypto_prices.memory_usage()

168

**MINI CHALLENGE #6 SOLUTION:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [60]:
BTC_price_series = pd.read_csv('crypto.csv', squeeze = False)
# Note that when you set Squeeze = False, the data is stored in a DataFrame by default. 
# DataFrame is simply used to store multi dimensional data as compares to Pandas Series that only holds 1-D dataset 
# Note that DataFrames has proper formatting when you attempt to view them as shown below 
# Note that Pandas Series has no formatting


In [61]:
BTC_price_series

Unnamed: 0,BTC-USD Price
0,457.334015
1,424.440002
2,394.795990
3,408.903992
4,398.821014
...,...
2380,55950.746090
2381,57750.199220
2382,58917.691410
2383,58918.832030


In [62]:
type(BTC_price_series)

pandas.core.frame.DataFrame

In [63]:
BTC_price_series = pd.read_csv('crypto.csv', squeeze = True)
type(BTC_price_series)

pandas.core.series.Series

**MINI CHALLENGE #7 SOLUTION:**
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- my_series = pd.Series(data = [-10, 100, -30, 50, 100])


In [64]:
my_series = pd.Series(data = [-10, 100, -30, 50, 100])
my_series

0    -10
1    100
2    -30
3     50
4    100
dtype: int64

In [65]:
abs(my_series)

0     10
1    100
2     30
3     50
4    100
dtype: int64

In [66]:
set(my_series)

{-30, -10, 50, 100}

**MINI CHALLENGE #8 SOLUTION:**
- **Sort the BTC_price_series values in a decending order instead. Make sure to update values in-memory.**

In [67]:
BTC_price_series.sort_values(ascending = False, inplace = True) 
BTC_price_series

2365    61243.085940
2366    59302.316410
2384    59095.808590
2383    58918.832030
2382    58917.691410
            ...     
123       210.339004
120       209.843994
121       208.097000
122       199.259995
119       178.102997
Name: BTC-USD Price, Length: 2385, dtype: float64

**MINI CHALLENGE #9 SOLUTION:**
- **Obtain the average price using two different methods**

In [68]:
# Obtain the average - Solution #1
BTC_price_series.sum()/BTC_price_series.count()

6471.857332852285

In [69]:
# Obtain the average - Solution #s
BTC_price_series.mean()

6471.857332852285

**MINI CHALLENGE #10 SOLUTION:**
- **Check if the stock price 399 exists in the BTC_price_series Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

In [70]:
399 in BTC_price_series.values

False

In [71]:
prices_series = round(BTC_price_series)
prices_series

2365    61243.0
2366    59302.0
2384    59096.0
2383    58919.0
2382    58918.0
         ...   
123       210.0
120       210.0
121       208.0
122       199.0
119       178.0
Name: BTC-USD Price, Length: 2385, dtype: float64

In [72]:
399 in prices_series.values

True