# Series

In [6]:
import pandas as pd

In [7]:
import pandas as pd
print(pd.__version__)

2.2.2


## Create a Series Object from a List
- A pandas **Series** is a one-dimensional labelled array.
- A Series combines the best features of a list and a dictionary.
- A Series maintains a single collection of ordered values (i.e. a single column of data).
- We can assign each value an identifier, which does not have to *be* unique.

In [110]:
wd = pd.read_csv(r'/Users/teslim/OneDrive - Teslim Uthman Adeyanju/Teslim_data_science_study/note_0_database/word_population_data.csv', usecols=["1980 Population"]).squeeze("columns")
wd

0      12486631
1       2941651
2      18739378
3         32886
4         35611
         ...   
229       11315
230      116775
231     9204938
232     5720438
233     7049926
Name: 1980 Population, Length: 234, dtype: int64

In [111]:
wd.count()

234

In [112]:
wd.size

234

## Create a Series Object from a Dictionary

## Intro to Series Methods
- The syntax to invoke a method on any object is `object.method()`.
- The `sum` method adds together the **Series'** values.
- The `product` method multiplies the **Series'** values.
- The `mean` method finds the average of the **Series'** values.
- The `std` method finds the standard deviation of the **Series'** values.

In [9]:
price = pd.Series([1, 2, 3, 4, 5])
price

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [10]:
price.memory_usage()

168

## Intro to Attributes
- An **attribute** is a piece of data that lives on an object.
- An **attribute** is a fact, a detail, a characteristic of the object.
- Access an attribute with `object.attribute` syntax.
- The `size` attribute returns a count of the number of values in the **Series**.
- The `is_unique` attribute returns True if the **Series** has no duplicate values.
- The `values` and `index` attributes return the underlying objects that holds the **Series'** values and index labels.

## Parameters and Arguments
- A **parameter** is the name for an expected input to a function/method/class instantiation.
- An **argument** is the concrete value we provide for a parameter during invocation.
- We can pass arguments either sequentially (based on parameter order) or with explicit parameter names written out.
- The first two parameters for the **Series** constructor are `data` and `index`, which represent the values and the index labels.

## Import Series with the pd.read_csv Function
- A **CSV** is a plain text file that uses line breaks to separate rows and commas to separate row values.
- Pandas ships with many different `read_` functions for different types of files.
- The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
- The `read_csv` function will import the dataset as a **DataFrame**, a 2-dimensional table.
- The `usecols` parameter accepts a list of the column(s) to import.
- The `squeeze` method converts a **DataFrame** to a **Series**.

In [11]:
pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [12]:
world_population= pd.read_csv("word_population_data.csv", usecols=["Capital", "2022 Population"], index_col=0).squeeze("columns")
world_population.head(10)


Capital
Kabul               41128771
Tirana               2842321
Algiers             44903225
Pago Pago              44273
Andorra la Vella       79824
Luanda              35588987
The Valley             15857
Saint John’s           93763
Buenos Aires        45510318
Yerevan              2780469
Name: 2022 Population, dtype: int64

In [13]:

pd.read_csv("google_stock_price.csv")

Unnamed: 0,Date,Price
0,2004-08-19,2.490664
1,2004-08-20,2.515820
2,2004-08-23,2.758411
3,2004-08-24,2.770615
4,2004-08-25,2.614201
...,...,...
4788,2023-08-28,132.080002
4789,2023-08-29,132.998001
4790,2023-08-30,135.570007
4791,2023-08-31,137.050003


## The head and tail Methods
- The `head` method returns a number of rows from the top/beginning of the `Series`.
- The `tail` method returns a number of rows from the bottom/end of the `Series`.

## Passing Series to Python's Built-In Functions
- The `len` function returns the length of the **Series**.
- The `type` function returns the type of an object.
- The `list` function converts the **Series** to a list.
- The `dict` function converts the **Series** to a dictionary.
- The `sorted` function converts the **Series** to a sorted list.
- The `max` function returns the largest value in the **Series**.
- The `min` function returns the smalllest value in the **Series**.

## Check for Inclusion with Python's in Keyword
- The `in` keyword checks if a value exists within an object.
- The `in` keyword will look for a value in the **Series's** index.
- Use the `index` and `values` attributes to access "nested" objects within the **Series**.
- Combine the `in` keyword with `values` to search within the **Series's** values.

In [14]:
pokemon = pd.read_csv("pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("google_stock_price.csv", usecols=["Price"]).squeeze("columns")

pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [15]:
pokemon.index

RangeIndex(start=0, stop=1010, step=1)

In [16]:
"Ivysaur" in pokemon.values


True

## The sort_values Method
- The `sort_values` method sorts a **Series** values in order.
- By default, pandas applies an ascending sort (smallest to largest).
- Customize the sort order with the `ascending` parameter.

In [17]:
google.sort_values()

10        2.470490
0         2.490664
13        2.509095
11        2.514326
1         2.515820
           ...    
4341    150.000000
4336    150.000000
4346    150.141754
4345    151.000000
4395    151.863495
Name: Price, Length: 4793, dtype: float64

In [18]:
google.sort_values(ascending=False).head()

4395    151.863495
4345    151.000000
4346    150.141754
4341    150.000000
4336    150.000000
Name: Price, dtype: float64

In [19]:
pokemon.sort_values(ascending=False).head()

717     Zygarde
633    Zweilous
40        Zubat
569       Zorua
570     Zoroark
Name: Name, dtype: object

## The sort_index Method
- The `sort_index` method sorts a **Series** by its index.
- The `sort_index` method also accepts an `ascending` parameter to set sort order.

In [20]:
pokeman_3= pd.read_csv("pokemon.csv", index_col=["Name"]).squeeze("columns")
pokeman_3


Name
Bulbasaur          Grass, Poison
Ivysaur            Grass, Poison
Venusaur           Grass, Poison
Charmander                  Fire
Charmeleon                  Fire
                      ...       
Iron Valiant     Fairy, Fighting
Koraidon        Fighting, Dragon
Miraidon        Electric, Dragon
Walking Wake       Water, Dragon
Iron Leaves       Grass, Psychic
Name: Type, Length: 1010, dtype: object

In [21]:
pokeman_3.sort_index(ascending=False)

Name
Zygarde      Dragon, Ground
Zweilous       Dark, Dragon
Zubat        Poison, Flying
Zorua                  Dark
Zoroark                Dark
                  ...      
Aegislash      Steel, Ghost
Accelgor                Bug
Absol                  Dark
Abra                Psychic
Abomasnow        Grass, Ice
Name: Type, Length: 1010, dtype: object

In [22]:
pokeman_3.sort_values(ascending=False)

Name
Empoleon      Water, Steel
Corsola        Water, Rock
Relicanth      Water, Rock
Drednaw        Water, Rock
Carracosta     Water, Rock
                  ...     
Scatterbug             Bug
Spewpa                 Bug
Metapod                Bug
Caterpie               Bug
Wurmple                Bug
Name: Type, Length: 1010, dtype: object

## Extract Series Value by Index Position
- Use the `iloc` accessor to extract a **Series** value by its index position.
- `iloc` is short for "index location".
- Python's list slicing syntaxes (slices, slices from start, slices to end, etc.) are supported with **Series** objects.

In [65]:
uk=pd.read_excel(r'/Users/teslim/OneDrive - Teslim Uthman Adeyanju/Teslim_data_science_study/note_0_database/uk_2024_election.xlsx', index_col=0)


In [66]:
uk.head()

Unnamed: 0_level_0,REGION,2019,2024,TURNOUT %,SHARE %
CONSTITUENCY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Amber Valley,East Midlands,Conservative,Labour,60.3,37.0
Ashfield,East Midlands,Conservative,Reform UK,58.6,42.8
Bassetlaw,East Midlands,Conservative,Labour,57.4,41.2
Bolsover,East Midlands,Conservative,Labour,56.5,40.5
Boston & Skegness,East Midlands,Conservative,Reform UK,53.4,38.4


In [67]:
uk.iloc[0]

REGION       East Midlands
2019          Conservative
2024                Labour
TURNOUT %             60.3
SHARE %               37.0
Name: Amber Valley, dtype: object

In [71]:
uk.loc["Amber Valley"] = 10

In [72]:
uk.head()

Unnamed: 0_level_0,REGION,2019,2024,TURNOUT %,SHARE %
CONSTITUENCY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Amber Valley,10,10,10,10.0,10.0
Ashfield,East Midlands,Conservative,Reform UK,58.6,42.8
Bassetlaw,East Midlands,Conservative,Labour,57.4,41.2
Bolsover,East Midlands,Conservative,Labour,56.5,40.5
Boston & Skegness,East Midlands,Conservative,Reform UK,53.4,38.4


In [54]:
pokeman_3.iloc[400]
world_population.iloc[0]

41128771

In [101]:
uk.count()

REGION       650
2019         650
2024         650
TURNOUT %    649
SHARE %      649
dtype: int64

In [102]:
uk.size()

TypeError: 'int' object is not callable

In [24]:
pokeman_3.iloc[[2,4,5,6]]

Name
Venusaur      Grass, Poison
Charmeleon             Fire
Charizard      Fire, Flying
Squirtle              Water
Name: Type, dtype: object

In [25]:
google.iloc[[2,4,5,6]]

2    2.758411
4    2.614201
5    2.613952
6    2.692408
Name: Price, dtype: float64

In [26]:
world_population.iloc[[0,2,4,5,6]]

Capital
Kabul               41128771
Algiers             44903225
Andorra la Vella       79824
Luanda              35588987
The Valley             15857
Name: 2022 Population, dtype: int64

In [27]:
pokeman_3.iloc[2:-6]


Name
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Charizard      Fire, Flying
Squirtle              Water
                  ...      
Gholdengo      Steel, Ghost
Wo-Chien        Dark, Grass
Chien-Pao         Dark, Ice
Ting-Lu        Dark, Ground
Chi-Yu           Dark, Fire
Name: Type, Length: 1002, dtype: object

In [28]:
world_population.iloc[16:-1]

Capital
Dhaka         171186372
Bridgetown       281635
Minsk           9534954
Brussels       11655930
Belmopan         405272
                ...    
Hanoi          98186856
Mata-Utu          11572
El Aaiún         575986
Sanaa          33696614
Lusaka         20017675
Name: 2022 Population, Length: 217, dtype: int64

## Extract Series Value by Index Label
- Use the `loc` accessor to extract a **Series** value by its index label.
- Pass a list to extract multiple values by index label.
- If one index label/position in the list does not exist, Pandas will raise an error.

In [35]:
pokeman_3.loc["Ivysaur"]
google.loc[2]


2.758411

In [30]:
pokeman_3.loc[["Ivysaur", "Venusaur", "Charmander"]]

Name
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Name: Type, dtype: object

## The get Method on a Series
- The `get` method extracts a **Series** value by index label. It is an alternative option to square brackets.
- The `get` method's second argument sets the fallback value to return if the label/position does not exist.

In [99]:
world_population.head()

Capital
Kabul               41128771
Tirana               2842321
Algiers             44903225
Pago Pago              44273
Andorra la Vella       79824
Name: 2022 Population, dtype: int64

In [32]:
world_population.loc[["Abuja", "Accra", "Addis Ababa"]]

Capital
Abuja          218541212
Accra           33475870
Addis Ababa    123379924
Name: 2022 Population, dtype: int64

In [33]:
pokeman_3.get("TEslim", "Not_Found_in he list")


'Not_Found_in he list'

In [98]:
from locale import normalize
from unicodedata import numeric


world_population.get(["Kabul", "Berlin", "Paris", "London"], "Not_Found_in he list").values_counts(normalize=True)


AttributeError: 'Series' object has no attribute 'values_counts'

In [36]:
world_population["Capital"].value_counts(normalize=True)

KeyError: 'Capital'

In [None]:
list = ["Kabul", "Berlin", "Paris", "London", "Nigeria"]

def check_list(list):
    for i in list:
        if i in world_population:
            print(i)
        else:
            print("Not Found")

check_list(list)

Kabul
Berlin
Paris
London
Not Found


## Overwrite a Series Value
- Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

In [37]:
pokeman_3.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [38]:
pokeman_3.iloc[0]

'Grass, Poison'

In [40]:
pokeman_3.iloc[0] = "Teslim"

In [41]:
pokeman_3.head()

Name
Bulbasaur            Teslim
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [None]:
pokeman_3.iloc[0] = "Teslim2"

In [42]:
pokeman_3.head()

Name
Bulbasaur            Teslim
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [51]:
pokeman_3.iloc[[0,1,2]] = ["Teslim", "Teslim2", "Teslim3"]

In [52]:
pokeman_3.loc["Bulbasaur"] = "Teslim4"

In [53]:
pokeman_3.head()

Name
Bulbasaur     Teslim4
Ivysaur       Teslim2
Venusaur      Teslim3
Charmander       Fire
Charmeleon       Fire
Name: Type, dtype: object

## The copy Method
- A **copy** is a duplicate/replica of an object.
- Changes to a copy do not modify the original object.
- A **view** is a different way of looking at the *same* data.
- Changes to a view *do* modify the original object.
- The `copy` method creates a copy of a pandas object.

In [86]:
pokeman_df = pd.read_csv("pokemon.csv", usecols=["Name"])
pokeman_df.head()

Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur
3,Charmander
4,Charmeleon


In [93]:
pokeman_series = pokeman_df.squeeze("columns").copy()
pokeman_series.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [96]:
pokeman_series.iloc[0] = "Teslim"
pokeman_series.head()

0        Teslim
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [97]:
pokeman_df.head()

Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur
3,Charmander
4,Charmeleon


## Math Methods on Series Objects
- The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
- The `sum` method adds together the **Series's** values.
- The `product` method multiplies together the **Series's** values.
- The `mean` method calculates the average of the **Series's** values.
- The `std` method calculates the standard deviation of the **Series's** values.
- The `max` method returns the largest value in the **Series**.
- The `min` method returns the smallest value in the **Series**.
- The `median` method returns the median of the **Series** (the value in the middle).
- The `mode` method returns the mode of the **Series** (the most frequent alue).
- The `describe` method returns a summary with various mathematical calculations.

## Broadcasting
- **Broadcasting** describes the process of applying an arithmetic operation to an array (i.e., a **Series**).
- We can combine mathematical operators with a **Series** to apply the mathematical operation to every value.
- There are also methods to accomplish the same results (`add`, `sub`, `mul`, `div`, etc.)

## The value_counts Method
- The `value_counts` method returns the number of times each unique value occurs in the **Series**.
- The `normalize` parameter returns the relative frequencies/percentages of the values instead of the counts.

In [116]:
pokeman_series.value_counts(normalize=True)*10000

Name
Teslim         9.90099
Honedge        9.90099
Vivillon       9.90099
Litleo         9.90099
Pyroar         9.90099
                ...   
Crawdaunt      9.90099
Baltoy         9.90099
Claydol        9.90099
Lileep         9.90099
Iron Leaves    9.90099
Name: proportion, Length: 1010, dtype: float64

## The apply Method
- The `apply` method accepts a function. It invokes that function on every `Series` value.

## The map Method
- The `map` method "maps" or connects each **Series** values to another value.
- We can pass the method a dictionary or a **Series**. Both types connects keys to values.
- The `map` method uses our argument to connect or bridge together the values.