<a href="https://colab.research.google.com/github/HakeemSalaudeen/PYTHON-FOR-DATA-ANALYSIS-/blob/main/PANDAS_SERIES.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PANDAS**

* We have two types of data structures in pandas
pandas series and pandas dataframe
* Every column in a database is treated as series and series is built on Numpy
* if you understand series you will understand pandas
* A pandas series is a one dimensional labelled array capable of holding data of various types (e.g int, str, flt etc)
* It is one of the core data structures provided by the pandas library and it resembles a column in a spreadsheet or a single column in a SQL table.
* Each element in a series has a label called an 'index' associated with it which makes it easy to access manipulate and perform operations on the data.
* Here is how to create a Pandas series and some key attributes and methods associated with it:


**Creating a series**:

You can create a series using the pd.series() constructor by passing a list, array or dictionary like object as the data structure

In [None]:
# import library

import pandas as pd

In [None]:
# creating a series from a list
# series shows you the index

data = (10, 20, 30, 40, 50)       #a list
series_data = pd.Series(data)       # .Series is a class under pandas   # converting the list to a pandas series

print(series_data)

0    10
1    20
2    30
3    40
4    50
dtype: int64


In [None]:
## creating a series with a custom index labels

data_index = ['one', 'two', 'three', 'four', 'five']
amount = [100, 500, 300, 5000, 900]
series_data_index = pd.Series (data, index = data_index )

print (series_data_index)

one      10
two      20
three    30
four     40
five     50
dtype: int64


In [None]:
#counting a series from a dictionaryy

data_dict = {'one': 10, 'two': 20, 'three': 30, 'four': 40, 'five': 50}
dict_to_series = pd.Series(data_dict)

print (dict_to_series)

one      10
two      20
three    30
four     40
five     50
dtype: int64


# **ACCESSING DATA IN A SERIES**

*  You can access data in a series using indexing and slicing




In [None]:
## accessing a single element by label

dict_to_series ['two']        #we will write the key word

20

In [None]:
series_data [2]

30

In [None]:
# slicing to get a subset of a list

dict_to_series [1 : 4]

two      20
three    30
four     40
dtype: int64

In [None]:
# OR
# slicing to get a subset of a list

dict_to_series ['two' : 'four']       #NB-- Notice that slicing in pd.Series included the last value unlike list

two      20
three    30
four     40
dtype: int64

In [None]:
#accessing multiple elements by label

dict_to_series [['two', 'four']]        #we use double square bracket to avoid errors

#it will give us a two dimention array

two     20
four    40
dtype: int64

# **KEY ATTRIBUTES OF A SERIES**

* values: returns the data in the series as a Numpy array
* index: returns the index of the series
* dtype: returns the data type of the element in the series
* name: returns the name for the series




In [None]:
series_data

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [None]:
series_data.shape

(5,)

In [None]:
# .values convert it to rows
# .values makes it appear like a 1D array

series_data.values

array([10, 20, 30, 40, 50])

In [None]:
# .index tells us the range
# start dtop and step
# it helps you look through the series

series_data.index

RangeIndex(start=0, stop=5, step=1)

In [None]:
#.dtype tells us the type of column
#we can also change the type if we want to

series_data.dtype

dtype('int64')

In [None]:
# adding new rows

series_data[5] = 60
series_data[6] = 70
series_data[7] = 70

In [None]:
## new row added

series_data

0    10
1    20
2    30
3    40
4    50
5    60
6    70
7    70
dtype: int64

In [None]:
# .head gives us the first five element in a series, dataset or a dataframe
# dont forget the parerntisis

series_data.head()

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [None]:
# to specify the amount of rows we want

series_data.head(7)         ## it will give the first seven rows

0    10
1    20
2    30
3    40
4    50
5    60
6    70
dtype: int64

In [None]:
# to specify the amount of rows we want

series_data.head(3)         ## it will give the first 3 rows

0    10
1    20
2    30
dtype: int64

In [None]:
# .describe gives us a statistical representation of the column
# dont forget the parerntisis
# the data type here is a float

series_data.describe()

count     8.000000
mean     43.750000
std      22.638463
min      10.000000
25%      27.500000
50%      45.000000
75%      62.500000
max      70.000000
dtype: float64

In [None]:
## to make ,describe appear as a float
# .astype is used to change the datatype of a series

series_data.describe().astype(int)        #it will appear as a float

count     8
mean     43
std      22
min      10
25%      27
50%      45
75%      62
max      70
dtype: int64

In [None]:
# .mean, incase u dont want to do it from .describe
# dont forget the parerntisis
# everycode we av been running is only running on a column notn the whole dataset

series_data.mean()

43.75

In [None]:
# maximum

series_data.max()

70

In [None]:
# minimum

series_data.min()

10

In [None]:
# standard deviation

series_data.std()

22.638462845343543

In [None]:
# variance

series_data.var()

512.5

In [None]:
# sum

series_data.sum()

350

# **ELEMENT WISE OPERATORS**

*  You can perform element wise oprators on series, such as substraction, addition, multiplication and division




In [None]:
series1 = pd.Series([1,2,3])
series2 = pd.Series([4,5,6])

print(series1)
print(series2)


0    1
1    2
2    3
dtype: int64
0    4
1    5
2    6
dtype: int64


In [None]:
#element wise addition
## we can two series together to get a new column

series_add = series1 + series2					#we will get a new data strutcure
#series_add is a new column

print (series_add)

0    5
1    7
2    9
dtype: int64


In [None]:
#element wise subtraction
## we can two series together to get a new column

series_subtraction = series1 - series2					#we will get a new data strutcure
#series_add is a new column

print (series_subtraction)

0   -3
1   -3
2   -3
dtype: int64


In [None]:
#element wise multiplication
## we can two series together to get a new column

series_multiplication = series1 * series2					#we will get a new data strutcure


print (series_multiplication)

0     4
1    10
2    18
dtype: int64


In [None]:
#element wise division
## we can two series together to get a new column

series_division =  series2 / series1 				#we will get a new data strutcure

print (series_division)

0    4.0
1    2.5
2    2.0
dtype: float64


In [None]:
#element wise power
## we can two series together to get a new column

series_power =  series2 ** series1 				#we will get a new data strutcure

print (series_power)

0      4
1     25
2    216
dtype: int64


# **FILTERING AND BOOLEAN SELECTION**

In [None]:

print (series_data > 30)     		#this is not a dataset/dataframe

print ()

## filtering values greater than 30

filter_series = series_data [series_data > 30]			# we will a complete dataset that will pick the on that true

print (filter_series)


0    False
1    False
2    False
3     True
4     True
dtype: bool

3    40
4    50
dtype: int64


In [None]:
## boolean selection: selecting values meeting a condition

series_data[series_data % 2 == 0]         # if the modulus of the dataset divided by 2 = 0, it going to print the dataframe

0    10
1    20
2    30
3    40
4    50
dtype: int64