<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/300px-Pandas_logo.svg.png" >


# Data Series

### What is Data Series?

In Pandas, a **Data Series** is a single, one-dimensional data structure. Simply put we can say that it is a single column of data. Data Series also contains a **indexes** of each element.

## 1.How to create Data Series?

In [34]:
import pandas as pd

In [62]:
cities = ["London", "Warsaw", "Berlin", "Paris"]
pd.Series(cities)

0    London
1    Warsaw
2    Berlin
3     Paris
dtype: object

In [64]:
pd.Series(["Cat", "Dog", "Parrot", "Hamster"])

0        Cat
1        Dog
2     Parrot
3    Hamster
dtype: object


If you want to create a Data Series with your **own names of indexes**, ypu should use a **dictionary**.

In [71]:
pet = {"name" : "Jack", "type" : "cat", "color" : "black"}
pd.Series(pet)

name      Jack
type       cat
color    black
dtype: object

### Exercises:

1. Create a list of week days and give it a name **weekdays**.
2. Create Data Series object based od created **weekdays** list, and assign it to variable **weekdaysSeries**. Display this variable.
3. Create list **freeDays**. List should contains the same amount of elements like **weekdays** list. In the next positions in the list assign value **True** if the day is free or **False** if the day is not free.
4. Create **freeDaysSeries** Data Series object based on **freeDays** list. Display created variable. What is the type of list?
5. Create dictionary **holidays**. Key is the name of holiday, a value is date.
6. Create Data Series **holidaysSeries** based on created dictionary.

### Solutions:

In [102]:
weekdays = ["Monday", "Tuesday", "Wendesday", "Thursday", "Friday", "Saturday", "Sunday"]
weekdaysSeries = pd.Series(weekdays)
weekdaysSeries

0       Monday
1      Tuesday
2    Wendesday
3     Thursday
4       Friday
5     Saturday
6       Sunday
dtype: object

In [108]:
freeDays = [False, False, False, False, False, True, True]
freeDaysSeries = pd.Series(freeDays)
freeDaysSeries

0    False
1    False
2    False
3    False
4    False
5     True
6     True
dtype: bool

In [118]:
holidays = {"New year" : "2025-01-01", "Epiphany" : "2025-01-06", "Easter" : "2025-04-01"}
holidaysSeries = pd.Series(holidays)
holidaysSeries

New year    2025-01-01
Epiphany    2025-01-06
Easter      2025-04-01
dtype: object

## 2. Data Series attributes

1. **size** - display size of the Data Series
2. **is_uniqe** - display **True** if all elements are uniqe.
3. **is_monotonic_decresing/decresing** - check monotonic of elements. For example in alphabetical order.
4. **values** - show all values in DS.
5. **dtype** - show type of DS.
6. **shape** - shape of DS.
7. **axes** - axes of DS.
8. **nbytes** - how many memory has DS.

### Exercises:

1. Create a list with 100000 random numbers elements.
2. Create **dataAsFloatSeries** DS object based on random list.
3. Display all atributes.
4. Create a list again, but each random number should be string.
5. Create **dataAsStringSeries** DS object based on random list.
6. Display attributes: **size**, **nbytes**, **dtype**.

### Solutions:

In [148]:
import random as rnd

dataAsFloatSeries = pd.Series([i*rnd.random() for i in range(100000)])
dataAsFloatSeries

0            0.000000
1            0.979406
2            1.959579
3            0.582970
4            1.155358
             ...     
99995    30502.665301
99996    74073.741310
99997    88220.968299
99998    69145.763693
99999    15304.667810
Length: 100000, dtype: float64

In [150]:
dataAsFloatSeries.size

100000

In [160]:
dataAsFloatSeries.is_unique

True

In [164]:
dataAsFloatSeries.nbytes

800000

In [166]:
dataAsFloatSeries.shape

(100000,)

In [176]:
dataAsStringSeries = pd.Series([str(i*rnd.random()) for i in range(100000)])
dataAsStringSeries

0                       0.0
1        0.5590690337915314
2        1.8840293434200341
3        0.5253851075830951
4         1.356009402868668
                ...        
99995     70872.99564548633
99996     32929.39784504684
99997    17457.305000124663
99998     85756.87108959926
99999     80728.02905133994
Length: 100000, dtype: object

In [180]:
dataAsStringSeries.size

100000

In [182]:
dataAsStringSeries.nbytes

800000

In [184]:
dataAsStringSeries.dtype

dtype('O')

## 3. Data Series methods

1. sum() - sum of elements in Data Series object.
2. min() - minimal value in Data Series object.
3. max() - maimal value in DS.
4. mean() - mean of elements in DS.
5. count() - amount of elements in DS. 
6. product() - mulitiplication of each element in DS.
7. to_list() - change Data Series to object.
8. add() - adds each value in Data Series with a specific way.

### Exercises:
1. Creaete two lists:
   - **cities** with the names of three biggest countries in the world: Shanghai, Beijing, Istanbul.
   - **population** with the amount of the citizents in each city in **cities** list.
2. Create citipop Data Series variable. The idnex of DS should be a name of the city from **cities** and the value sould be a **population**. Create this object with passing aguments by position.
3. Do the same steps like before, but create DS with names of aguments.
4. What is the sum of all citizents?
5. What is the mean?
6. Show the index by methon and by attribute.
7. Show the values of **citypop**.

### Solutuions:

In [9]:
import pandas as pd
cities = ["Shanghai", "Beijing", "Istanbul"]
population = [24_870_000, 21_766_000, 15_660_000]

In [25]:
citypop = pd.Series(population, cities)
citypop

Shanghai    24870000
Beijing     21766000
Istanbul    15660000
dtype: int64

In [19]:
citypop2 = pd.Series(index = cities, data = population)
citypop2

Shanghai    24870000
Beijing     21766000
Istanbul    15660000
dtype: int64

In [27]:
citypop.sum()

62296000

In [29]:
citypop.mean()

20765333.333333332

In [31]:
citypop.index

Index(['Shanghai', 'Beijing', 'Istanbul'], dtype='object')

In [39]:
citypop.values

array([24870000, 21766000, 15660000], dtype=int64)

## 4. Data Filtering in Data Series

### Boolean indexing

This is a fast and simple way of filtering data in Data Series. Use it, when you just want show filtered data and dont want to modify it.

In [53]:
numbers = [1,2,3,4,5,6]
numbersSeries = pd.Series(numbers)
numbersSeries>3

0    False
1    False
2    False
3     True
4     True
5     True
dtype: bool

In [55]:
numbersSeries[numbersSeries>3]

3    4
4    5
5    6
dtype: int64

In [65]:
letters = list("abcd")
lettersSeries = pd.Series(letters)
lettersSeries > "b"

0    False
1    False
2     True
3     True
dtype: bool

In [69]:
lettersSeries[lettersSeries>"b"]

2    c
3    d
dtype: object

In [73]:
lettersSeries

0    a
1    b
2    c
3    d
dtype: object

### Masking 

This is mroe complex way of filtering data in Data Series. In this method, we have more control. We can modify original Data Series, or we can swap mismatched data. We can also just drop missmatced data without modifing original Sereis.

In [82]:
numbers = [1,2,3,4,5,6]
numbersSeries = pd.Series(numbers)
numbersSeries > 3

0    False
1    False
2    False
3     True
4     True
5     True
dtype: bool

In [91]:
numbersSeries.where(numbersSeries > 3)

0    NaN
1    NaN
2    NaN
3    4.0
4    5.0
5    6.0
dtype: float64

In [93]:
numbersSeries.where(numbersSeries > 3, other = "not passing")

0    not passing
1    not passing
2    not passing
3              4
4              5
5              6
dtype: object

In [97]:
numbersSeries.where(numbersSeries > 3).dropna()

3    4.0
4    5.0
5    6.0
dtype: float64

In [105]:
numbersSeries.where(numbersSeries > 3, inplace=True)
numbersSeries

0    NaN
1    NaN
2    NaN
3    4.0
4    5.0
5    6.0
dtype: float64

In [107]:
numbersSeries.dropna()

3    4.0
4    5.0
5    6.0
dtype: float64

In [109]:
numbersSeries

0    NaN
1    NaN
2    NaN
3    4.0
4    5.0
5    6.0
dtype: float64

In [111]:
numbersSeries.dropna(inplace=True)

In [113]:
numbersSeries

3    4.0
4    5.0
5    6.0
dtype: float64

### Filltering based on index

In [119]:
numbersSeries = pd.Series(numbers)
numbersSeries

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64

In [125]:
numbersSeries.filter(items=[0,2,4])

0    1
2    3
4    5
dtype: int64

### Exercises: 

In [138]:
accidentsNumber = [14,334,312,5823,9491,7486,4343]
age = [ "<6", "7-14", "15-17", "18-24", "25-39", "40-59", ">60"]
accidents = pd.Series(index = age, data = accidentsNumber)

1. Show only the groups that caused mroe than 1000 accidents. At first display **NaN** also, then eliminate it.
2. Assign a value of previous command in a **incident100** variable. Display it.
3. Make sure, that original Data Series has not changed.
4. Display data for only **18-59** age.
5. Filter original Data Series. Show only positions that have number of accidents less or equal 1000. Object should have only this values (modify original structure).

### Solutions: 

In [143]:
accidents.where(accidents > 1000)

<6          NaN
7-14        NaN
15-17       NaN
18-24    5823.0
25-39    9491.0
40-59    7486.0
>60      4343.0
dtype: float64

In [155]:
accidents.where(accidents > 1000).dropna().index

Index(['18-24', '25-39', '40-59', '>60'], dtype='object')

In [157]:
accidents.where(accidents > 1000).dropna()

18-24    5823.0
25-39    9491.0
40-59    7486.0
>60      4343.0
dtype: float64

In [159]:
accidents[accidents > 1000]

18-24    5823
25-39    9491
40-59    7486
>60      4343
dtype: int64

In [161]:
accidents[accidents > 1000].index

Index(['18-24', '25-39', '40-59', '>60'], dtype='object')

In [165]:
incident1000 = accidents.where(accidents > 1000).dropna()
incident1000

18-24    5823.0
25-39    9491.0
40-59    7486.0
>60      4343.0
dtype: float64

In [167]:
accidents

<6         14
7-14      334
15-17     312
18-24    5823
25-39    9491
40-59    7486
>60      4343
dtype: int64

In [173]:
accidents.filter(items = ["18-24", "25-39", "40-59"])

18-24    5823
25-39    9491
40-59    7486
dtype: int64

In [177]:
accidents.where(accidents <= 1000, inplace = True)
accidents.dropna(inplace = True)
accidents

<6        14.0
7-14     334.0
15-17    312.0
dtype: float64

## 5. More complex filtering

## Exercises:

In [190]:
namesList = ['Albania','Austria','Belarus',
'Belgium','Bulgaria','Croatia','Cyprus','Czech Republic','Denmark','Estonia',
'Finland','France','Germany','Greece','Hungary','Iceland','Ireland','Italy',
'Latvia','Lithuania','Luxembourg','Macedonia','Malta','Montenegro','Netherlands',
'Norway','Poland','Portugal','Romania','Russia','Serbia','Slovenia','Spain', 'Sweden','Switzerland','United Kingdom','Turkey','Ukraine']
energy2010List = [1947,8347,3564,8369,4560,3814,4623,6348,6328,6506,16483,7736,7264,5318,3876,
51440,5911,5494,3230,3471,16830,3521,4171,5420,7010,24891,3797,4959,2551,
6410,4359,6521,5707,14934,8175,2498,3550,5701]
energy2012List  = [2118,8507,3698,7987,4762,3819,4057,6305,6039,6689,15687,7344,7270,5511,3919,
53203,5665,5398,3588,3608,14696,3626,4761,5416,6871,23658,3899,4736,2604,
6617,4387,6778,5573,14290,7886,2794,3641,5452]

namesSeries = pd.Series(namesList)
energy2010Series = pd.Series(energy2010List)
energy2012Series = pd.Series(energy2012List)

1. Create variables for mean of 2010 and 2012.
2. Compare mean energy usage to energy usage in 2010 and 2012 separately (above mean).
3. Show only countries that usage was above mean in 2010 and 2012.
4. Compare mean energy in 2010 to energy use in 2010. (below).
5. Check countries that has less usage in 2010 than mean in 2010 and at the same time mean in 2012 was higher than usage in 2012.

In [205]:
mean2010 = energy2010Series.mean()
mean2012 = energy2012Series.mean()

filterAboveMean2010 = energy2010Series > mean2010
filterAboveMean2012 = energy2012Series > mean2012

namesSeries.where(filterAboveMean2010 & filterAboveMean2012).dropna()

1         Austria
3         Belgium
10        Finland
15        Iceland
20     Luxembourg
25         Norway
33         Sweden
34    Switzerland
dtype: object

In [211]:
filterBelowMean2010 = energy2010Series < mean2010

namesSeries.where(filterBelowMean2010 & filterAboveMean2012).dropna()

Series([], dtype: object)

## 6. Reading Series Objets