## Importing padnas and numpy
* Pandas are used for manipulating and analysing data

In [1]:
import pandas as pd
import numpy as np

# Creating a Series using pandas
* Similar to python arrays or numpy arrays, but their index can be changed, so you can refer to them using different, custom indexes.

* pd in pd.Series calls the pandas and the .Series is a function of the library. That's why we add () before the [].

* The word Series is needded to be written with capital S :)

In [2]:
s1=pd.Series([2.4,4.6,7.8])

* Naming out Series - the series name . name

In [3]:
s1.name="Products"

* Changing the indexes of the elements

In [4]:
s1.index=["volkswagen","volvo","hyundai"]

* How does s1 look by now? Let's check

In [5]:
s1

volkswagen    2.4
volvo         4.6
hyundai       7.8
Name: Products, dtype: float64

* Now, the car brands are the indexes
* We can refer to them using the brands:

In [6]:
s1["hyundai"]

7.8

* Or even call up 2 of them using double brackets

In [7]:
s1[["volvo","hyundai"]]

volvo      4.6
hyundai    7.8
Name: Products, dtype: float64

* Let's try doing everything within the series creation:
  * We first create the series
  * Then, we close the brackets, keeping the ()open and we name our Series
  * After that, we can give change the indexes using a coma
  * To finish, we close the ()

In [8]:
s2=pd.Series(["safest",
              "safer",
              "safe"],
            name="Car Brand Safety Ratings",
            index=["volvo","mercedes-benz","hyundai"])

* We can also change the indexes when passing the values in the Series.
  * We add {} instead of [] as they create a so-called dictionary and the can have name and value - which name is going to be the index.

In [9]:
s3=pd.Series({"safest":"volvo",
             "safer":"mercedes-benz",
              "safe":"hyundai"},
              name="Car Brand Safety Ratings")

In [10]:
s3

safest            volvo
safer     mercedes-benz
safe            hyundai
Name: Car Brand Safety Ratings, dtype: object

* As you can see, the series came out the same

* If you want to refer to the values using their index number, you can use the iloc:

In [11]:
s3.iloc[0]

'volvo'

* Or multiple:

In [12]:
s3.iloc[[0,2]]

safest      volvo
safe      hyundai
Name: Car Brand Safety Ratings, dtype: object

* Or get the last one:

In [13]:
s3[-1]

'hyundai'

* you can also choose a range of items

In [14]:
s3["safest":"safe"]

safest            volvo
safer     mercedes-benz
safe            hyundai
Name: Car Brand Safety Ratings, dtype: object

## Boolean Series - Pandas

* Create a series that contains numbers in it

In [15]:
XC_60_base=pd.Series({"XC-60 Momentum": 40000,
                    "XC-60 R-design": 50000,
                    "XC-60 Inscription": 60000
                   },
                  name="Volvo XC-60 2021 prices")

In [16]:
XC_60_base

XC-60 Momentum       40000
XC-60 R-design       50000
XC-60 Inscription    60000
Name: Volvo XC-60 2021 prices, dtype: int64

* Let's say that the pro versions of each trim costs 4000 more:

In [17]:
XC_60_PRO=XC_60_base+4000

In [18]:
XC_60_PRO

XC-60 Momentum       44000
XC-60 R-design       54000
XC-60 Inscription    64000
Name: Volvo XC-60 2021 prices, dtype: int64

* We can also use *, /, - and even ><= (if we want true-false to be printed out)

In [19]:
XC_60_base>50000

XC-60 Momentum       False
XC-60 R-design       False
XC-60 Inscription     True
Name: Volvo XC-60 2021 prices, dtype: bool

In [20]:
XC_60_base

XC-60 Momentum       40000
XC-60 R-design       50000
XC-60 Inscription    60000
Name: Volvo XC-60 2021 prices, dtype: int64

* As you can see, this statement's new value doesn't get stored, so if we want to keep it we need to assign it to a new variable.

* What about selecting data? Let's see:
    * You can select data besed on an arguement

In [21]:
XC_60_base[XC_60_base>=50000]

XC-60 R-design       50000
XC-60 Inscription    60000
Name: Volvo XC-60 2021 prices, dtype: int64

* Let's get the mean value:
    * Mean value is just all the values being added together, and then that number getting divided by the number of the numbers if that makes sense

In [22]:
XC_60_base.mean()

50000.0

* Mean can be used in data selection too:

In [23]:
XC_60_PRO[XC_60_PRO>XC_60_base.mean()]

XC-60 R-design       54000
XC-60 Inscription    64000
Name: Volvo XC-60 2021 prices, dtype: int64

* using .std(), we can calculate the standard deviation which is just the square root of varitation(which is calculated similarly to mean but the values are on the 2nd power).
* Using standard deviation, we can easily say which values are 'normal', which are 'extra-large' or 'extra-small'.
* If the data is further away from the mean,the deviation is higher meaning more 'spread out' data.

TAKE A MOMENT TO READ THAT AGAIN SINCE ITS VERY IMPORTANT FO YOU TO UNDERSTAND THIS 

### LOGICAL OPERATORS

* In python, logical operators, are the keywords AND, OR, NOT. When using pandas, it is prefered to use:
    * ~ not
    * | or
    * & and

In [24]:
XC_60_base[((XC_60_base<=50000) & ~(XC_60_base>65000))]

XC-60 Momentum    40000
XC-60 R-design    50000
Name: Volvo XC-60 2021 prices, dtype: int64

In [25]:
np.log(XC_60_base)

XC-60 Momentum       10.596635
XC-60 R-design       10.819778
XC-60 Inscription    11.002100
Name: Volvo XC-60 2021 prices, dtype: float64

## Modifying data in series

* You ccan modify data by calling them using their index:

In [36]:
s3["safe"]="Subaru"

* As you can see bellow, the value of "safe" has changed

In [37]:
s3

safest          volvo
safer     Rolls Royce
safe           Subaru
Name: Car Brand Safety Ratings, dtype: object

* Or using iloc which allows you to refer to it using an index number

In [38]:
s3.iloc[1]="Rolls Royce"

In [39]:
s3

safest          volvo
safer     Rolls Royce
safe           Subaru
Name: Car Brand Safety Ratings, dtype: object

In [54]:
merc=pd.Series({"C Class": 50000,
                    "S Class": 60000,
                    "E Class": 65000,
                    "GLC Class": 60000,
                    "GLB Class": 50000
                   },
                  name="Mercedes catalogue prices")



* Or say that all values > 60000 go to 55000

In [55]:
merc[merc>60000]=55000

In [56]:
merc

C Class      50000
S Class      60000
E Class      55000
GLC Class    60000
GLB Class    50000
Name: Mercedes catalogue prices, dtype: int64