## Building a Model

In [1]:
import pandas as pd

Below is the DataFrame we will be using this lesson:

In [32]:
import pandas as pd

info = {
    'Name': ['Aurora', 'Fujii', 'Shreya', 'Alec', 'Mia', 'Wei', 'Samir', 'Chloe', 'Daniel', 'Fatima'],
    'Age': [25, 17, 37, 29, 33, 45, 22, 19, 40, 28],
    'City': ['Stavanger', 'Tokyo', 'Mumbai', 'Los Angeles', 'Berlin', 'Shanghai', 'Cairo', 'Sydney', 'New York', 'Paris'],
    'Country': ['Norway', 'Japan', 'India', 'USA', 'Germany', 'China', 'Egypt', 'Australia', 'USA', 'France'],
    'Job-Title': ['Engineer', 'Student', 'Manager', 'Analyst', 'Designer', 'Executive', 'Intern', 'Student', 'Director', 'Consultant'],
    'Years_Experience': [3, 0, 10, 5, 8, 20, 0, 0, 15, 6],
}

Df = pd.DataFrame(info, index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], columns=['Name', 'Age', 'City', 'Country', 'Job-Title', 'Years_Experience'])

print(Df)

      Name  Age         City    Country   Job-Title  Years_Experience
1   Aurora   25    Stavanger     Norway    Engineer                 3
2    Fujii   17        Tokyo      Japan     Student                 0
3   Shreya   37       Mumbai      India     Manager                10
4     Alec   29  Los Angeles        USA     Analyst                 5
5      Mia   33       Berlin    Germany    Designer                 8
6      Wei   45     Shanghai      China   Executive                20
7    Samir   22        Cairo      Egypt      Intern                 0
8    Chloe   19       Sydney  Australia     Student                 0
9   Daniel   40     New York        USA    Director                15
10  Fatima   28        Paris     France  Consultant                 6


---


### Ways to select a subset of data to choose **features**
There are multiple ways but at the moment we shall discuss 2 approaches:

1. **Dot Notation**
    - Syntax = *DataFrame.ColumnName*

    - This selects a single column as a Pandas "series" when the column name does not contain spaces or special characters... basically like you're pointing to a column name assuming it is a valid variable name. This method is a shorthand convenience. It treats the column name as if it were a direct property (like an attribute) of the DataFrame object.
    
    - This fails if the column name has spaces (like "First Name") or conflicts with a built-in DataFrame method (like if you have a column named "count") or has special characters (like "price-INR").


2. **Selecting with a Single Column Key**
    - Syntax = *DataFrame['Column Name']*

    - You use string quotes ('...') to tell the computer, "Go look up this exact string key in the dictionary of columns." This method is the standard, robust method. It treats the DataFrame as a Python Dictionary where column names are the keys.

    - This works for all the ways that the Dot Notation method fails, but its constraint is that it requires you to use quotes and brackets every time, making it slightly longer and less convenient to type compared to dot notation. (It also returns a Series when selecting a single column, unless you pass a list like [['Column Name 1' , 'Column Name 2']], which returns a DataFrame.)


In [None]:
#Dot Notation Method  (error occurs if column name has space or special character)
y= Df.Job-Title
print(y)

AttributeError: 'DataFrame' object has no attribute 'Job'

In [34]:
#Selection with Column Key Method
y= Df['Job-Title']
print(y)

1       Engineer
2        Student
3        Manager
4        Analyst
5       Designer
6      Executive
7         Intern
8        Student
9       Director
10    Consultant
Name: Job-Title, dtype: object


Columns that are inputted into our model and used to make predictions in future are called **"features"**. We can select multiple features by providing a list of column names (in string format with quotes) inside brackets. 

In [43]:
Career_features = ['Name', 'Job-Title' , 'Years_Experience']
x = Df[Career_features]

#printing first 5 rows to confirm it worked
print(x.head())

#Getting summary statistics of the features
x.describe()

     Name Job-Title  Years_Experience
1  Aurora  Engineer                 3
2   Fujii   Student                 0
3  Shreya   Manager                10
4    Alec   Analyst                 5
5     Mia  Designer                 8


Unnamed: 0,Years_Experience
count,10.0
mean,6.7
std,6.750309
min,0.0
25%,0.75
50%,5.5
75%,9.5
max,20.0


---
### Model Building