<a href="https://colab.research.google.com/github/chrispaladin7/Python_Pandas/blob/main/Panda_DataSet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Series**
*   Series are built on top of Numpy arrays.
*   They are 1 dimensional and can hold multiple different types of data.





In [None]:
import pandas as pd
import numpy as np
from google.colab import files
import io

list_1 = ['a', 'b', 'c', 'd', 100, False]

list_1



['a', 'b', 'c', 'd', 100, False]

A series also has a index, which can be provided different labels. 

In [None]:
labels = [1,2,3,4,5,6]
ser_1 = pd.Series(data = list_1, index = labels)
ser_1

1        a
2        b
3        c
4        d
5      100
6    False
dtype: object

In [None]:
ser_1[6]

False

You can also add an Numpy array

In [None]:
arr_1 = np.array([1,2,3,4])
ser_2 = pd.Series(arr_1)

ser_2

0    1
1    2
2    3
3    4
dtype: int64

Dictionaries can also be used, which will allow us to to add labels and values.

In [None]:
dict_1 = {'f_name': 'Joe', 'l_name': 'Smith', 'age': 30}
ser_3 = pd.Series(dict_1)

# now I can get a value by the label
# ser_3['f_name']
ser_3

f_name      Joe
l_name    Smith
age          30
dtype: object

For any series we create, we can get the data type

In [None]:
ser_3.dtype

dtype('O')

We can also perform math operations on a series. 

In [None]:
ser_2 + ser_2

0    2
1    4
2    6
3    8
dtype: int64

In [None]:
ser_2 * ser_2

0     1
1     4
2     9
3    16
dtype: int64

The main diference between a Series and Numpy is the operations are aligned by labels. 

**Data Frames**

These are the most commonly used data structures in Pandas. They are also made up from multiple series, that are going to share the same index or label. 

They can also contain different types of data.

Dictionaries can also be used as well. 

In [None]:
arr_2 = np.random.randint(10, 50, size=(2,3))
arr_2

array([[31, 19, 27],
       [14, 20, 10]])

we can also create data frames by passing in the array, along with the row columns and column labels.

In [None]:
df_1 = pd.DataFrame(arr_2, ['A', 'B'], ['C', 'D', 'E'])
df_1

Unnamed: 0,C,D,E
A,26,35,24
B,29,15,32


In [None]:
df_1['C']

A    26
B    29
Name: C, dtype: int64

Using a dictionary

In [None]:
dict_3 = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
          'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df_2 = pd.DataFrame(dict_3)
df_2

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [None]:
# using from dictionary, which is going to accept column labels and lists
pd.DataFrame.from_dict(dict([('A', [1,2,3]), ('B', [4,5,6])]))


Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


**Working with the customers data set**

In [None]:
data = files.upload()

Saving Customers.csv to Customers.csv


In [None]:
customer_df = pd.read_csv(io.StringIO(data['Customers.csv'].decode('utf-8')))
customer_df

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7
1996,1997,Female,91,73158,32,Doctor,7,7
1997,1998,Male,87,90961,14,Healthcare,9,2
1998,1999,Male,77,182109,4,Executive,7,2


Reading Column Data

In [None]:
customer_df['Gender']

0         Male
1         Male
2       Female
3       Female
4       Female
         ...  
1995    Female
1996    Female
1997      Male
1998      Male
1999      Male
Name: Gender, Length: 2000, dtype: object

In [None]:
customer_df[['Gender', 'Age', 'Family Size']]

Unnamed: 0,Gender,Age,Family Size
0,Male,19,4
1,Male,21,3
2,Female,20,1
3,Female,23,2
4,Female,31,6
...,...,...,...
1995,Female,71,7
1996,Female,91,7
1997,Male,87,2
1998,Male,77,2


Creating a new column

In [None]:
customer_df['New'] = 'null'
new_df = customer_df
new_df

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size,New
0,1,Male,19,15000,39,Healthcare,1,4,
1,2,Male,21,35000,81,Engineer,3,3,
2,3,Female,20,86000,6,Engineer,1,1,
3,4,Female,23,59000,77,Lawyer,0,2,
4,5,Female,31,38000,40,Entertainment,2,6,
...,...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7,
1996,1997,Female,91,73158,32,Doctor,7,7,
1997,1998,Male,87,90961,14,Healthcare,9,2,
1998,1999,Male,77,182109,4,Executive,7,2,


Dropping a column

In [None]:
new_df.drop('New', axis = 1)

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7
1996,1997,Female,91,73158,32,Doctor,7,7
1997,1998,Male,87,90961,14,Healthcare,9,2
1998,1999,Male,77,182109,4,Executive,7,2


Temporary drop

In [None]:
new_df

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size,New
0,1,Male,19,15000,39,Healthcare,1,4,
1,2,Male,21,35000,81,Engineer,3,3,
2,3,Female,20,86000,6,Engineer,1,1,
3,4,Female,23,59000,77,Lawyer,0,2,
4,5,Female,31,38000,40,Entertainment,2,6,
...,...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7,
1996,1997,Female,91,73158,32,Doctor,7,7,
1997,1998,Male,87,90961,14,Healthcare,9,2,
1998,1999,Male,77,182109,4,Executive,7,2,


Permanent drop

In [None]:
new_df.drop('New', axis = 1, inplace=True)

In [None]:
new_df

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7
1996,1997,Female,91,73158,32,Doctor,7,7
1997,1998,Male,87,90961,14,Healthcare,9,2
1998,1999,Male,77,182109,4,Executive,7,2


Drop a row

In [None]:
customer_df

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7
1996,1997,Female,91,73158,32,Doctor,7,7
1997,1998,Male,87,90961,14,Healthcare,9,2
1998,1999,Male,77,182109,4,Executive,7,2


Temporary drop

In [None]:
customer_df.drop(1, axis = 0)

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
5,6,Female,22,58000,76,Artist,0,2
...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7
1996,1997,Female,91,73158,32,Doctor,7,7
1997,1998,Male,87,90961,14,Healthcare,9,2
1998,1999,Male,77,182109,4,Executive,7,2


In [None]:
customer_df

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7
1996,1997,Female,91,73158,32,Doctor,7,7
1997,1998,Male,87,90961,14,Healthcare,9,2
1998,1999,Male,77,182109,4,Executive,7,2


Permanent Drop

In [None]:
customer_df.drop(1, axis = 0, inplace=True)

In [None]:
customer_df

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
5,6,Female,22,58000,76,Artist,0,2
...,...,...,...,...,...,...,...,...
1995,1996,Female,71,184387,40,Artist,8,7
1996,1997,Female,91,73158,32,Doctor,7,7
1997,1998,Male,87,90961,14,Healthcare,9,2
1998,1999,Male,77,182109,4,Executive,7,2


Fetching Row data using 


*   loc
*   iloc



In [None]:
cus_4_ser = customer_df.loc[4]
cus_4_ser

CustomerID                            5
Gender                           Female
Age                                  31
Annual Income ($)                 38000
Spending Score (1-100)               40
Profession                Entertainment
Work Experience                       2
Family Size                           6
Name: 4, dtype: object

In [None]:
customer_df.iloc[1:4]

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6


Conditional Selection

In [None]:
small_list = customer_df.iloc[0:25]
small_list

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6
5,6,Female,22,58000,76,Artist,0,2
6,7,Female,35,31000,6,Healthcare,1,3
7,8,Female,23,84000,94,Healthcare,1,3
8,9,Male,64,97000,3,Engineer,0,3
9,10,Female,30,98000,72,Artist,1,4
10,11,Male,67,7000,14,Engineer,1,3


In [None]:
small_list['Age'] > 30

0     False
2     False
3     False
4      True
5     False
6      True
7     False
8      True
9     False
10     True
11     True
12     True
13    False
14     True
15    False
16     True
17    False
18     True
19     True
20     True
21    False
22     True
23     True
24     True
25    False
Name: Age, dtype: bool

In [None]:
small_list[small_list['Age'] > 30]

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
4,5,Female,31,38000,40,Entertainment,2,6
6,7,Female,35,31000,6,Healthcare,1,3
8,9,Male,64,97000,3,Engineer,0,3
10,11,Male,67,7000,14,Engineer,1,3
11,12,Female,35,93000,99,Healthcare,4,4
12,13,Female,58,80000,15,Executive,0,5
14,15,Male,37,19000,13,Doctor,0,1
16,17,Female,35,29000,35,Homemaker,9,5
18,19,Male,52,20000,29,Entertainment,1,4
19,20,Female,35,62000,98,Artist,0,1
