# <h1 style="text-align: center;" class="list-group-item list-group-item-action active" data-toggle="list" role="tab" aria-controls="home">Introduction to Python</h1>

**Course Description**

Python is a general-purpose programming language that is becoming ever more popular for data science. Companies worldwide are using Python to harvest insights from their data and gain a competitive edge. Unlike other Python tutorials, this course focuses on Python specifically for data science. In our Introduction to Python course, you’ll learn about powerful ways to store and manipulate data, and helpful data science tools to begin conducting your own analyses. Start DataCamp’s online Python curriculum now.

<a id="toc"></a>

<h3 class="list-group-item list-group-item-action active" data-toggle="list" role="tab" aria-controls="home">Table of Contents</h3>
    
* [1. Python basics](#1)
    - Hello Python
    - Variables and Types

* [2. Python Lists](#2) 
    - Python Lists
    - List of Lists
    - Manipulating Lists
    
* [3. Functions and Packages](#3)
    - Functions
    - Methods
    - Packages
    
* [4. Numpy](#4)
    - Numpy
    - 2D Numpy Arrays
    - Centering and scaling
    - Numpy: Basic Statistics

**Explore Datasets**

Use the arrays imported in the first cell to explore the data and practice your skills!

- Print out the weight of the first ten baseball players.
- What is the median weight of all baseball players in the data?
- Print out the names of all players with a height greater than 80 (heights are in inches).
- Who is taller on average? Baseball players or soccer players? Keep in mind that baseball heights are stored in inches!
- The values in soccer_shooting are decimals. Convert them to whole numbers (e.g., 0.98 becomes 98).
- Do taller players get higher ratings? Calculate the correlation between soccer_ratings and soccer_heights to find out!
- What is the average rating for attacking players ('A')?

In [61]:
# Importing the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

# Importing the course datasets 
df_baseball = pd.read_csv('datasets/baseball.csv')
df_soccer = pd.read_csv('datasets/soccer.csv')

In [62]:
df_baseball.head()

Unnamed: 0,Name,Team,Position,Height,Weight,Age,PosCategory
0,Adam_Donachie,BAL,Catcher,74,180,22.99,Catcher
1,Paul_Bako,BAL,Catcher,74,215,34.69,Catcher
2,Ramon_Hernandez,BAL,Catcher,72,210,30.78,Catcher
3,Kevin_Millar,BAL,First_Baseman,72,210,35.43,Infielder
4,Chris_Gomez,BAL,First_Baseman,73,188,35.71,Infielder


In [63]:
df_soccer.head()

Unnamed: 0,id,name,rating,position,height,foot,rare,pace,shooting,passing,dribbling,defending,heading,diving,handling,kicking,reflexes,speed,positioning
0,1001,Gábor Király,69,GK,191,Right,0,,,,,,,70.0,66.0,63.0,74.0,35.0,66.0
1,100143,Frederik Boi,65,M,184,Right,0,61.0,0.65,63.0,59.0,62.0,62.0,,,,,,
2,100264,Tomasz Szewczuk,57,A,185,Right,0,65.0,0.54,43.0,53.0,55.0,74.0,,,,,,
3,100325,Steeve Joseph-Reinette,63,D,180,Left,0,68.0,0.38,51.0,46.0,64.0,71.0,,,,,,
4,100326,Kamel Chafni,72,M,181,Right,0,75.0,0.64,67.0,72.0,57.0,66.0,,,,,,


In [64]:
df_baseball.sort_values(by='Weight', ascending=False)[['Name','Weight']][:10]

Unnamed: 0,Name,Weight
154,C.C._Sabathia,290
229,Chris_Britton,278
61,Bobby_Jenks,270
59,Andrew_Sisco,260
909,Jon_Rauch,260
815,Prince_Fielder,260
458,Boof_Bonser,260
890,Mike_Restovich,257
531,Carlos_Zambrano,255
567,Jose_Valverde,254


In [65]:
df_baseball.Weight.median()
#np.median(baseball.Weight)
#baseball.describe()['Weight']

200.0

In [66]:
df_baseball[df_baseball['Height']>80]['Name']

59         Andrew_Sisco
558       Randy_Johnson
764    Mark_Hendrickson
862         Chris_Young
909           Jon_Rauch
Name: Name, dtype: object

In [67]:
# 1 inc = 0.0254 m

print(df_baseball['Height'].mean()*0.0254*100)
print(df_soccer['height'].mean())

187.17172413793102
181.75042387249914


In [68]:
df_soccer.shooting.value_counts(dropna=False)

NaN     930
0.60    293
0.58    280
0.54    258
0.64    253
       ... 
0.86      2
0.12      1
0.13      1
0.90      1
0.14      1
Name: shooting, Length: 80, dtype: int64

In [69]:
df_soccer['shooting'] = soccer['shooting'].apply(lambda x: x*100)
soccer.shooting.value_counts(dropna=False)  # na_action='ignore'

NaN     930
60.0    293
58.0    280
54.0    258
64.0    253
       ... 
86.0      2
12.0      1
13.0      1
90.0      1
14.0      1
Name: shooting, Length: 80, dtype: int64

In [70]:
df_soccer['rating'].corr(soccer['height'])

-0.006108577058543698

In [71]:
df_soccer[soccer['position']=='A']['rating'].mean()

67.26080691642652

## <a id="1"></a>
<font color="lightseagreen" size=+2.5><b>1. Python Basics</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

An introduction to the basic concepts of Python. Learn how to use Python interactively and by using a script. Create your first variables and acquaint yourself with Python's basic data types.

### 1. Hello Python!

Hi, my name is Hugo and I'll be your host for Introduction to Python for Data Science. I'm a data scientist and educator at DataCamp and host of the DataFramed podcast, which you must check out.

**2. How you will learn**

![image.png](attachment:image.png)

In this course, you will learn Python for Data Science through video lessons, like this one, and interactive exercises. You get your own Python session where you can experiment and try to come up with the correct code to solve the instructions. You're learning by doing, while receiving customized and instant feedback on your work.

**3. Python**

![image-2.png](attachment:image-2.png)

for latest version https://www.python.org/downloads/

Python was conceived by Guido Van Rossum. Here, you can see a photo of me with Guido. What started as a hobby project, soon became a general purpose programming language: nowadays, you can use Python to build practically any piece of software. But how did this happen? Well, first of all, Python is open source. It's free to use. Second, it's very easy to build packages in Python, which is code that you can share with other people to solve specific problems. Throughout time, more and more of these packages specifically built for data science have been developed. Suppose you want to make some fancy visualizations of your company's sales. There's a package for that. Or what about connecting to a database to analyze sensor measurements? There's also a package for that. People often refer to Python as the swiss army knife of programming languages as you can do almost anything with it. In this course, we'll start to build up your data science coding skills bit by bit, so make sure to stick around to see how powerful the language can be. Our courses focus on Python 3. To install Python 3 on your own system, follow the steps at this URL.

**4. IPython Shell**

![image-3.png](attachment:image-3.png)

Now that you're all eyes and ears for Python, let's start experimenting. I'll start with the

**5. IPython Shell**

![image-4.png](attachment:image-4.png)

Python shell, a place where you can type Python code and immediately see the results. In DataCamp's exercise interface, this shell is embedded here. Let's start off simple and use Python as a calculator.

**6. IPython Shell**

![image-5.png](attachment:image-5.png)

Let me type 4 + 5, and hit Enter. Python interprets what you typed and prints the result of your calculation, 9. The Python shell that's used here is actually not the original one; we're using IPython, short for Interactive Python, which is some kind of juiced up version of regular Python that'll be useful later on. IPython was created by Fernando Pérez and is part of the broader Jupyter ecosystem. Apart from interactively working with Python, you can also have Python run so called

**7. Python Script**

![image-6.png](attachment:image-6.png)

python scripts. These python scripts are simply text files with the extension (dot) py. It's basically a list of Python commands that are executed, almost as if you where typing the commands in the shell yourself, line by line.

**8. Python Script**

![image-7.png](attachment:image-7.png)

Let's put the command from before in a script now, which can be found here in DataCamp's interface. The next step is executing the script, by clicking 'Submit Answer'. If you execute this script in the DataCamp interface, there's nothing in the output pane. That's because you have to explicitly use print inside scripts if you want to generate output during execution.

**9. Python Script**

![image-8.png](attachment:image-8.png)

Let's wrap our previous calculation in a print call, and rerun the script. This time, the same output as before is generated, great! Putting your code in Python scripts instead of manually retyping every step interactively will help you to keep structure and avoid retyping everything over and over again if you want to make a change; you simply make the change in the script, and rerun the entire thing.

**10. DataCamp Interface**

![image-9.png](attachment:image-9.png)

Now that you've got an idea about different ways of working with Python, I suggest you head over to the exercises. Use the IPython Shell for experimentation, and use the Python script editor to code the actual answer. If you click Submit Answer, your script will be executed and checked for correctness.

**11. Let's practice!**

Get coding and don't forget to have fun!

![image.png](attachment:image.png)
Correct! Python is an extremely flexible language.

### Exercise

**The Python Interface**

Hit Run Code to run your first Python code with Datacamp and see the output!

Notice the script.py window; this is where you can type Python code to solve exercises. You can hit Run Code and Submit Answer as often as you want. If you're stuck, you can click Get Hint, and ultimately Get Solution.

You can also use the IPython Shell interactively by typing commands and hitting Enter. Here, your code will not be checked for correctness so it is a great way to experiment.

**Instructions**

- Experiment in the IPython Shell; type 5 / 8, for example.
- Add another line of code to script.py, print(7 + 10), to be checked for correctness.
- Hit Submit Answer to execute the Python script and receive feedback.

In [72]:
# Example, do not modify!
print(5 / 8)

# Print the sum of 7 and 10
print(7+10)

0.625
17


### Exercise

**Any comments?**

You can also add comments to your Python scripts. Comments are important to make sure that you and others can understand what your code is about and do not run as Python code.

They start with # tag. See the comment in the editor, # Division; now it's your turn to add a comment!

**Instructions**

- Above the print(7 + 10), add the comment # Addition

In [73]:
# Division
print(5 / 8)

# Addition
print(7 + 10)

0.625
17


### Exercise

**Python as a calculator**

Python is perfectly suited to do basic calculations. It can do addition, subtraction, multiplication and division.

The code in the script gives some examples.

Now it's your turn to practice!

**Instructions**

- Print the sum of 4 + 5.
- Print the result of subtracting 5 from 5.
- Print the result of multiplying 3 by 5.
- Print the result of dividing 10 by 2.

In [74]:
# Addition
print(4+5)

# Subtraction
print(5-5)

# Multiplication
print(3*5)

# Division
print(10/2)

9
0
15
5.0


### 1. Variables and Types

Well done and welcome back! It's clear that Python is a great calculator. If you want to do more complex calculations though, you will want to "save" values while you're coding along.

**2. Variable**

![image.png](attachment:image.png)

You can do this by defining a variable, with a specific, case-sensitive name. Once you create (or declare) such a variable, you can later call up its value by typing the variable name. Suppose you measure your height and weight, in metric units: you are 1-point-79 meters tall, and weigh 68-point-7 kilograms. You can assign these values to two variables, named height and weight, with an equals sign: If you now type the name of the variable, height, Python looks for the variable name, retrieves its value, and prints it out.

**3. Calculate BMI**

![image-2.png](attachment:image-2.png)

Let's now calculate the Body Mass Index, or BMI, which is calculated as follows, with weight in kilograms and height in meters. You can do this with the actual values, but you can just as well use the variables height and weight, like in here. Every time you type the variable's name, you are asking Python to change it with the actual value of the variable. weight corresponds to 68-point-7, and height to 1-point-79. Finally, this version has Python store the result in a new variable, bmi. bmi now contains the same value as the one you calculated earlier. In Python, variables are used all the time. They help to make your code reproducible.

**4. Reproducibility**

![image-3.png](attachment:image-3.png)

Suppose the code to create the height, weight and bmi variable are in a script, like this. If you now want to recalculate the bmi for another weight,

**5. Reproducibility**

![image-4.png](attachment:image-4.png)

you can simply change the declaration of the weight variable, and rerun the script. The bmi changes accordingly, because the value of the variable weight has changed as well. So far, we've only worked with numerical values, such as height and weight.

**6. Python Types**

![image-5.png](attachment:image-5.png)

In Python, these numbers all have a specific type. You can check out the type of a value with the type function. To see the type of our bmi value, simply write type and then bmi inside parentheses. You can see that it's a float, which is python's way of representing a real number, so a number which can have both an integer part and a fractional part. Python also has a type for integers: int, like this example. To do data science, you'll need more than ints and floats, though.

**7. Python Types (2)**

![image-6.png](attachment:image-6.png)

Python features tons of other data types. The most common ones are strings and booleans. A string is Python's way to represent text. You can use both double and single quotes to build a string, as you can see from these examples. If you print the type of the last variable here, you see that it's str, short for string. The Boolean is a type that can either be True or False. You can think of it as 'Yes' and 'No' in everyday language. Booleans will be very useful in the future, to perform filtering operations on your data for example. There's something special about Python data types.

**8. Python Types (3)**

![image-7.png](attachment:image-7.png)

Have a look at this line of code, that sums two integers, and then this line of code, that sums two strings. For the integers, the values were summed, while for the strings, the strings were pasted together. The plus operator behaved differently for different data types. This is a general principle: how the code behaves depends on the types you're working with. In the exercises that follow, you'll create your first variables and experiment with some of Python's data types. I'll see you in the next video to explain all about lists.

**9. Let's practice!**

Let's get you coding and I can't wait to see you in the next chapter where you'll build even more awesome python charts.

### Exercise

**Variable Assignment**

In Python, a variable allows you to refer to a value with a name. To create a variable x with a value of 5, you use =, like this example:

x = 5

You can now use the name of this variable, x, instead of the actual value, 5.

Remember, = in Python means assignment, it doesn't test equality!

**Instructions**

- Create a variable savings with the value of 100.
- Check out this variable by typing print(savings) in the script.

In [75]:
# Create a variable savings
savings = 100

# Print out savings
print(savings)

100


### Exercise

**Calculations with variables**

You've now created a savings variable, so let's start saving!

Instead of calculating with the actual values, you can use variables instead. The savings variable you created in the previous exercise with a value of 100 is available to you.

How much money would you have saved four months from now, if you saved $10 each month?

**Instructions**

- Create a variable monthly_savings, equal to 10 and num_months, equal to 4.
- Multiply monthly_savings by num_months and save it to new_savings.
- Add new_savings to savings, saving the sum as total_savings.
- Print the value of total_savings.

In [76]:
savings = 100

# Create the variables monthly_savings and num_months
monthly_savings = 10
num_months = 4

# Multiply monthly_savings and num_months
new_savings = monthly_savings * num_months

# Add new_savings to your savings
total_savings = savings + new_savings

# Print total_savings
print(total_savings)

140


### Exercise

**Other variable types**

In the previous exercise, you worked with the integer Python data type:

- int, or integer: a number without a fractional part. savings, with the value 100, is an example of an integer.

Next to numerical data types, there are three other very common data types:

- float, or floating point: a number that has both an integer and fractional part, separated by a point. 1.1, is an example of a float.
- str, or string: a type to represent text. You can use single or double quotes to build a string.
- bool, or boolean: a type to represent logical values. It can only be True or False (the capitalization is important!).

**Instructions**

- Create a new float, half, with the value 0.5.
- Create a new string, intro, with the value "Hello! How are you?".
- Create a new boolean, is_good, with the value True.

In [77]:
# Create a variable half
half = 0.5

# Create a variable intro
intro = 'Hello! How are you?'

# Create a variable is_good
is_good = True

In [78]:
a = 0.5
b = 'Hello'
c = False

### Exercise

**Guess the type**

To find out the type of a value or a variable that refers to that value, you can use the type() function. Suppose you've defined a variable a, but you forgot the type of this variable. To determine the type of a, simply execute:

type(a)

We already went ahead and created three variables: a, b and c. You can use the IPython shell to discover their type. Which of the following options is correct?

**Instructions**

![image.png](attachment:image.png)

In [79]:
print(type(a))
print(type(b))
print(type(c))

<class 'float'>
<class 'str'>
<class 'bool'>


### Exercise

**Operations with other types**

Hugo mentioned that different types behave differently in Python.

When you sum two strings, for example, you'll get different behavior than when you sum two integers or two booleans.

In the script some variables with different types have already been created. It's up to you to use them.

**Instructions**

- Calculate the product of monthly_savings and num_months. Store the result in year_savings.
- What do you think the resulting type will be? Find out by printing out the type of year_savings.
- Calculate the sum of intro and intro and store the result in a new variable doubleintro.
- Print out doubleintro. Did you expect this?

In [80]:
monthly_savings = 10
num_months = 12
intro = "Hello! How are you?"

# Calculate year_savings using monthly_savings and num_months
year_savings = monthly_savings * num_months

# Print the type of year_savings
print(type(year_savings))

# Assign sum of intro and intro to doubleintro
doubleintro = intro + intro

# Print out doubleintro
print(doubleintro)

<class 'int'>
Hello! How are you?Hello! How are you?


### Exercise

**Type conversion**

Using the + operator to paste together two strings can be very useful in building custom messages.

Suppose, for example, that you've calculated your savings want to summarize the results in a string.

To do this, you'll need to explicitly convert the types of your variables. More specifically, you'll need str(), to convert a value into a string. str(savings), for example, will convert the integer savings to a string.

Similar functions such as int(), float() and bool() will help you convert Python values into any type.

**Instructions**

- Hit Run Code to run the code. Try to understand the error message.
- Fix the code such that the printout runs without errors; use the function str() to convert the variables savings and total_savings to strings.
- Convert the variable pi_string to a float and store this float as a new variable, pi_float.

In [81]:
# Definition of savings and total_savings
savings = 100
total_savings = 150

# Fix the printout
#print("I started with $" + savings + " and now have $" + total_savings + ". Awesome!")
print("I started with $" + str(savings) + " and now have $" + str(total_savings) + ". Awesome!")

# Definition of pi_string
pi_string = "3.1415926"

# Convert pi_string into float: pi_float
pi_float = float(pi_string)

I started with $100 and now have $150. Awesome!


### Exercise

**Can Python handle everything?**

Now that you know something more about combining different sources of information, have a look at the four Python expressions below. Which one of these will throw an error? You can always copy and paste this code in the IPython Shell to find out!

**Instructions**

![image.png](attachment:image.png)

## <a id="2"></a>
<font color="lightseagreen" size=+2.5><b>2. Python Lists</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

An introduction to the basic concepts of Python. Learn how to use Python interactively and by using a script. Create your first variables and acquaint yourself with Python's basic data types.

### 1. Python Lists

Welcome back aspiring Pythonista. By now, you've played around with different data types, and I hope you've had as much fun as I have.

**2. Python Data Types**

![image.png](attachment:image.png)

On the numbers side, there's the float, to represent a real number, and the int, to represent an integer. Next, we also have str, short for string, to represent text in Python, and bool, which can be either True or False. You can save these values as a variable, like these examples show. Each variable then represents a single value. As a data scientist,

**3. Problem**

![image-2.png](attachment:image-2.png)

you'll often want to work with many data points. If you for example want to measure the height of everybody in your family, and store this information in Python, it would be inconvenient to create a new python variable for each point you collected right? What you can do instead, is store all this information in a Python list.

**4. Python List**

![image-3.png](attachment:image-3.png)

You can build such a list with square brackets. Suppose you asked your two sisters and parents for their height, in meters. You can build the list as follows: Of course, also this data structure can be referenced to with a variable. Simply put the variable name and the equals sign in front, like here. A list is a way to give a single name to a collection of values. These values, or elements, can have any type; they can be floats, integer, booleans, strings, but also more advanced Python types, even lists. It's perfectly possible for a list to contain different types as well.

**5. Python List**

![image-4.png](attachment:image-4.png)

Suppose, for example, that you want to add the names of your sisters and parents to the list, so that you know which height belongs to who. You can throw in some strings without issues. But that's not all. I just told you that lists can also contain lists themselves. Instead of putting the strings in between the numbers, you can create little sublists for each member of the family. One for liz, one for emma and so on. Now, you can tell Python that these sublists are the elements of another list, that I named fam2: the little lists are wrapped in square brackets and separated with commas. If you now print out fam2, you see that we have a list of lists. The main list contains 4 sub-lists. We're dealing with a new Python type here, next to the strings, booleans, integers and floats you already know about:

**6. List type**

![image-5.png](attachment:image-5.png)

the list. These calls show that both fam and fam2 are lists. Remember that I told you that each type has specific functionality and behavior associated? Well, for lists, this is also true. Python lists host a bunch of tools to subset and adapt them. But let's take this step by step,

**7. Let's practice!**

and have you experiment with list creation first!

### Exercise

**Create a list**

As opposed to int, bool etc., a list is a compound data type; you can group values together:

- a = "is"
- b = "nice"
- my_list = ["my", "list", a, b]

After measuring the height of your family, you decide to collect some information on the house you're living in. The areas of the different parts of your house are stored in separate variables for now, as shown in the script.

**Instructions**

- Create a list, areas, that contains the area of the hallway (hall), kitchen (kit), living room (liv), bedroom (bed) and bathroom (bath), in this order. Use the predefined variables.
- Print areas with the print() function.

In [82]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Create list areas
areas = [11.25,18.0,20.0,10.75,9.50]

# Print areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5]


### Exercise

**Create list with different types**

A list can contain any Python type. Although it's not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc.

The printout of the previous exercise wasn't really satisfying. It's just a list of numbers representing the areas, but you can't tell which area corresponds to which part of your house.

The code in the editor is the start of a solution. For some of the areas, the name of the corresponding room is already placed in front. Pay attention here! "bathroom" is a string, while bath is a variable that represents the float 9.50 you specified earlier.

**Instructions**

- Finish the code that creates the areas list. Build the list so that the list first contains the name of each room as a string and then its area. In other words, add the strings "hallway", "kitchen" and "bedroom" at the appropriate locations.
- Print areas again; is the printout more informative this time?

![image.png](attachment:image.png)

In [83]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Adapt list areas
areas = ["hallway", hall, "kitchen", kit, "living room", liv, "bedroom", bed, "bathroom", bath]

# Print areas
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]


### Exercise

**Select the valid list**

A list can contain any Python type. But a list itself is also a Python type. That means that a list can also contain a list! Python is getting funkier by the minute, but fear not, just remember the list syntax:

- my_list = [el1, el2, el3]

Can you tell which ones of the following lines of Python code are valid ways to build a list?

- A. [1, 3, 4, 2] B. [[1, 2, 3], [4, 5, 7]] C. [1 + 2, "a" * 5, 3]

![image.png](attachment:image.png)

### Exercise

**List of lists**

As a data scientist, you'll often be dealing with a lot of data, and it will make sense to group some of this data.

Instead of creating a flat list containing strings and floats, representing the names and areas of the rooms in your house, you can create a list of lists. The script in the editor can already give you an idea.

Don't get confused here: "hallway" is a string, while hall is a variable that represents the float 11.25 you specified earlier.

**Instructions**

- Finish the list of lists so that it also contains the bedroom and bathroom data. Make sure you enter these in order!
- Print out house; does this way of structuring your data make more sense?
- Print out the type of house. Are you still dealing with a list?

![image.png](attachment:image.png)

In [84]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# house information as list of lists
house = [["hallway", hall],
         ["kitchen", kit],
         ["living room", liv],
         ["bedroom", bed],
         ["bathroom", bath]]

# Print out house
print(house)

# Print out the type of house
print(type(house))

[['hallway', 11.25], ['kitchen', 18.0], ['living room', 20.0], ['bedroom', 10.75], ['bathroom', 9.5]]
<class 'list'>


### 1. Subsetting Lists

After you've created your very own Python list, you'll need to know how you can access information in the list.

**2. Subsetting lists**

![image.png](attachment:image.png)

Python uses the index to do this. Have a look at the fam list again here. The first element in the list has index 0, the second element has index 1, and so on. Suppose that you want to select the height of emma, the float 1-point-68. It's the fourth element, so it has index 3. To select it, you use 3 inside square brackets. Similarly, to select the string "dad" from the list,

**3. Subsetting lists**

![image-2.png](attachment:image-2.png)

which is the seventh element in the list, you'll need to put the index 6 inside square brackets. You can also count backwards, using negative indexes. This is useful if you want to get some elements at the end of your list. To get your dad's height, for example, you'll need the index -1. These are the negative indexes for all list elements.

**4. Subsetting lists**

![image-3.png](attachment:image-3.png)

This means that both these lines return the exact same result. Apart from indexing, there's also something called slicing,

**5. List slicing**

![image-4.png](attachment:image-4.png)

which allows you to select multiple elements from a list, thus creating a new list. You can do this by specifying a range, using a colon. Let's first have another look at the list, and then try this piece of code. Can you guess what it'll return? A list with the the float 1-point-68, the string "mom", and the float 1-point-71, corresponding to the 4th, 5th and 6th element in the list maybe? Let's see what the output is. Apparently, only the elements with index 3 and 4, get returned. The element with index 5 is not included. In general, this is the syntax: the index you specify before the colon, so where the slice starts, is included, while the index you specify after the colon, where the slice ends, is not. With this in mind, can you tell what this call will return? You probably guessed correctly that this call gives you a list with three elements, corresponding to the elements with index 1, 2 and 3 of the fam list. You can also choose to just leave out the index before or after the colon.

**6. List slicing**

![image-5.png](attachment:image-5.png)

If you leave out the index where the slice should begin, you're telling Python to start the slice from index 0, like this example. If you leave out the index where the slice should end, you include all elements up to and including the last element in the list, like here. Now it's time to head over to the exercises,

**7. Let's practice!**

where you will continue to work on the list you've created yourself before. You'll use different subsetting methods to get exactly the piece of information you need!

### Exercise

**Subset and conquer**

Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list x and then selects "b" from it. Remember that this is the second element, so it has index 1. You can also use negative indexing.

- x = ["a", "b", "c", "d"]
- x[1]
- x[-3] # same result!

Remember the areas list from before, containing both strings and floats? Its definition is already in the script. Can you add the correct code to do some Python subsetting?

**Instructions**

- Print out the second element from the areas list (it has the value 11.25).
- Subset and print out the last element of areas, being 9.50. Using a negative index makes sense here!
- Select the number representing the area of the living room (20.0) and print it out.

In [85]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
print(areas[5])

11.25
9.5
20.0


### Exercise

**Subset and calculate**

After you've extracted values from a list, you can use them to perform additional calculations. Take this example, where the second and fourth element of a list x are extracted. The strings that result are pasted together using the + operator:

- x = ["a", "b", "c", "d"]
- print(x[1] + x[3])

**Instructions**

- Using a combination of list subsetting and variable assignment, create a new variable, eat_sleep_area, that contains the sum of the area of the kitchen and the area of the bedroom.
- Print the new variable eat_sleep_area.

In [86]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Sum of kitchen and bedroom area: eat_sleep_area
eat_sleep_area = areas[3] + areas[7]

# Print the variable eat_sleep_area
print(eat_sleep_area)

28.75


### Exercise

**Slicing and dicing**

Selecting single values from a list is just one part of the story. It's also possible to slice your list, which means selecting multiple elements from your list. Use the following syntax:

- my_list[start:end]

The start index will be included, while the end index is not.

The code sample below shows an example. A list with "b" and "c", corresponding to indexes 1 and 2, are selected from a list x:

- x = ["a", "b", "c", "d"]
- x[1:3]

The elements with index 1 and 2 are included, while the element with index 3 is not.

**Instructions**

- Use slicing to create a list, downstairs, that contains the first 6 elements of areas.
- Do a similar thing to create a new variable, upstairs, that contains the last 4 elements of areas.
- Print both downstairs and upstairs using print().

In [87]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Use slicing to create downstairs
downstairs = areas[0:6]

# Use slicing to create upstairs
upstairs = areas[-4:10]

# Print out downstairs and upstairs
print(downstairs, upstairs)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0] ['bedroom', 10.75, 'bathroom', 9.5]


### Exercise

**Slicing and dicing (2)**

In the video, Hugo first discussed the syntax where you specify both where to begin and end the slice of your list:

- my_list[begin:end]

However, it's also possible not to specify these indexes. If you don't specify the begin index, Python figures out that you want to start your slice at the beginning of your list. If you don't specify the end index, the slice will go all the way to the last element of your list. To experiment with this, try the following commands in the IPython Shell:

- x = ["a", "b", "c", "d"]
- x[:2]
- x[2:]
- x[:]

**Instructions**

- Create downstairs again, as the first 6 elements of areas. This time, simplify the slicing by omitting the begin index.
- Create upstairs again, as the last 4 elements of areas. This time, simplify the slicing by omitting the end index.

In [88]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Alternative slicing to create downstairs
downstairs = areas[:6]

# Alternative slicing to create upstairs
upstairs = areas[-4:]

print(downstairs, upstairs)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0] ['bedroom', 10.75, 'bathroom', 9.5]


### Exercise

**Subsetting lists of lists**

You saw before that a Python list can contain practically anything; even other lists! To subset lists of lists, you can use the same technique as before: square brackets. Try out the commands in the following code sample in the IPython Shell:

- x = [["a", "b", "c"],["d", "e", "f"],["g", "h", "i"]]
- x[2][0]
- x[2][:2]

x[2] results in a list, that you can subset again by adding additional square brackets.

What will house[-1][1] return? house, the list of lists that you created before, is already defined for you in the workspace. You can experiment with it in the IPython Shell.

**Instructions**

![image.png](attachment:image.png)

In [89]:
house

[['hallway', 11.25],
 ['kitchen', 18.0],
 ['living room', 20.0],
 ['bedroom', 10.75],
 ['bathroom', 9.5]]

In [90]:
house[-1][1]

9.5

### 1. Manipulating Lists

Wow, you're doing super well. So now, after creation and subsetting, the final piece of the Python lists puzzle is

**2. List Manipulation**

![image.png](attachment:image.png)

manipulation, so ways to change elements in your list, or to add elements to and remove elements from your list.

**3. Changing list elements**

![image-2.png](attachment:image-2.png)

Changing list elements is pretty straightforward. You use the same square brackets that we've used to subset lists, and then assign new elements to it using the equals sign. Suppose that after another look at fam, you realize that your dad's height is not up to date anymore, as he's shrinking with age. Instead of 1-point-89 meters, it should be 1-point-86 meters. To change this list element, which is at index 7, you can use this line of code. If you now check out fam, you'll see that the value is updated. You can even change an entire list slice at once. To change the elements "liz" and 1-point-73, you access the first two elements with 0:2, and then assign a new list to it. Do you still remember how the plus operator was different for strings and integers?

**4. Adding and removing elements**

![image-3.png](attachment:image-3.png)

Well, it's again different for lists. If you use the plus sign with two lists, Python simply pastes together their contents in a single list. Suppose you want to add your own name and height to the fam height list. This will do the trick. Of course, you can also store this new list in a variable, fam_ext for example. Finally, deleting elements from a list is also pretty straightforward, you'll have to use del here. Take this line, for example, that deletes the element with index 2, so "emma", from the list. If you check out fam now, you'll see that the "emma" string is gone. Because you've removed an index, all elements that came after "emma" scooted over by one index. If you again run the same line, you're again removing the element at index 2, which is emma's height, 1-point-68 meters now. Understanding how Python lists actually work

**5. Behind the scenes (1)**

![image-4.png](attachment:image-4.png)

behind the scenes becomes pretty important now. What actually happens when you create a new list, x, like this? Well, in a simplified sense, you're storing a list in your computer memory, and store the 'address' of that list, so

**6. Behind the scenes (1)**

![image-5.png](attachment:image-5.png)

where the list is in your computer memory, in x. This means that x does not actually contain all the list elements, it rather contains a reference to the list. For basic operations, the difference is not that important, but it becomes more so when you start copying lists. Let me clarify this with an example. Let's store the list x as a new variable y, by simply using the equals sign. Let's now change the element with index one in the list y, like this. The funky thing is that if you now check out x again, also here the second element was changed. That's because when you copied x to y with the equals sign,

**7. Behind the scenes (1)**

![image-6.png](attachment:image-6.png)

you copied the reference to the list, not the actual values themselves.

**8. Behind the scenes (1)**

![image-7.png](attachment:image-7.png)

When you're updating an element the list, it's one and the same list in the computer memory your changing. Both x and y point to this list, so the update is visible from both variables. If you want to create a list y that points to a new list in the memory with the same values,

**9. Behind the scenes (2)**

![image-8.png](attachment:image-8.png)

you'll need to use something else than the equals sign. You can use the list function,

**10. Behind the scenes (2)**

![image-9.png](attachment:image-9.png)

like this, or use slicing to select all list elements explicitly. If you now

**11. Behind the scenes (2)**

![image-10.png](attachment:image-10.png)

make a change to the list y points to, x is not affected. If this was a bit too much to take in, don't worry.

**12. Let's practice!**

The exercises will help you understand list manipulation and the subtle inner workings of lists. I'm sure you'll do great!

### Exercise

**Replace list elements**

Replacing list elements is pretty easy. Simply subset the list and assign new values to the subset. You can select single elements or you can change entire list slices at once.

Use the IPython Shell to experiment with the commands below. Can you tell what's happening and why?

- x = ["a", "b", "c", "d"]
- x[1] = "r"
- x[2:] = ["s", "t"]

For this and the following exercises, you'll continue working on the areas list that contains the names and areas of different rooms in a house.

**Instructions**

- Update the area of the bathroom area to be 10.50 square meters instead of 9.50.
- Make the areas list more trendy! Change "living room" to "chill zone".

![image.png](attachment:image.png)

In [91]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Correct the bathroom area
areas[9] = 10.50

# Change "living room" to "chill zone"
areas[4] = "chill zone"

areas

['hallway',
 11.25,
 'kitchen',
 18.0,
 'chill zone',
 20.0,
 'bedroom',
 10.75,
 'bathroom',
 10.5]

### Exercise

**Extend a list**

If you can change elements in a list, you sure want to be able to add elements to it, right? You can use the + operator:

- x = ["a", "b", "c", "d"]
- y = x + ["e", "f"]

You just won the lottery, awesome! You decide to build a poolhouse and a garage. Can you add the information to the areas list?

**Instructions**

- Use the + operator to paste the list ["poolhouse", 24.5] to the end of the areas list. Store the resulting list as areas_1.
- Further extend areas_1 by adding data on your garage. Add the string "garage" and float 15.45. Name the resulting list areas_2.

![image.png](attachment:image.png)

In [92]:
# Create the areas list and make some changes
areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0,
         "bedroom", 10.75, "bathroom", 10.50]

# Add poolhouse data to areas, new list is areas_1
areas_1 = areas + ["poolhouse", 24.5]
# areas.append("poolhouse")  --> append() bir seferde tek eleman ekleyebiliyor
# areas.insert(10, "poolhouse") --> insert() bir seferde tek eleman ekleyebiliyor. 
                                  # Hangi index e ekleyebileceğimizi belirleyebiliyoruz
# areas.extend(["garage", 15.45]) --> extend() ile de ekleme yapabiliyoruz. metod içerisine liste olarak verdiğimiz elemanları tek tek ekliyor

# Add garage data to areas_1, new list is areas_2
areas_2 = areas_1 + ["garage", 15.45]

areas_2

['hallway',
 11.25,
 'kitchen',
 18.0,
 'chill zone',
 20.0,
 'bedroom',
 10.75,
 'bathroom',
 10.5,
 'poolhouse',
 24.5,
 'garage',
 15.45]

### Exercise

**Delete list elements**

Finally, you can also remove elements from your list. You can do this with the del statement:

- x = ["a", "b", "c", "d"]
- del(x[1])

Pay attention here: as soon as you remove an element from a list, the indexes of the elements that come after the deleted element all change!

The updated and extended version of areas that you've built in the previous exercises is coded below. You can copy and paste this into the IPython Shell to play around with the result.

areas = ["hallway", 11.25, "kitchen", 18.0,
        "chill zone", 20.0, "bedroom", 10.75,
         "bathroom", 10.50, "poolhouse", 24.5,
         "garage", 15.45]

There was a mistake! The amount you won with the lottery is not that big after all and it looks like the poolhouse isn't going to happen. You decide to remove the corresponding string and float from the areas list.

The ; sign is used to place commands on the same line. The following two code chunks are equivalent:

#Same line

- command1; command2

#Separate lines

- command1
- command2

Which of the code chunks will do the job for us?

![image.png](attachment:image.png)

### Exercise

**Inner workings of lists**

At the end of the video, Hugo explained how Python lists work behind the scenes. In this exercise you'll get some hands-on experience with this.

The Python code in the script already creates a list with the name areas and a copy named areas_copy. Next, the first element in the areas_copy list is changed and the areas list is printed out. If you hit Run Code you'll see that, although you've changed areas_copy, the change also takes effect in the areas list. That's because areas and areas_copy point to the same list.

If you want to prevent changes in areas_copy from also taking effect in areas, you'll have to do a more explicit copy of the areas list. You can do this with list() or by using [:].

**Instructions**

- Change the second command, that creates the variable areas_copy, such that areas_copy is an explicit copy of areas. After your edit, changes made to areas_copy shouldn't affect areas. Submit the answer to check this.

In [93]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Create areas_copy
areas_copy = areas

# Change areas_copy
areas_copy[0] = 5.0

# Print areas
print(areas)

[5.0, 18.0, 20.0, 10.75, 9.5]


In [94]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Create areas_copy
areas_copy = areas.copy()

# Change areas_copy
areas_copy[0] = 5.0

# Print areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5]


## <a id="3"></a>
<font color="lightseagreen" size=+2.5><b>3. Functions and Packages</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

You'll learn how to use functions, methods, and packages to efficiently leverage the code that brilliant Python developers have written. The goal is to reduce the amount of code you need to solve challenging problems!

### 1. Functions

In this video, I'm going to introduce you to functions. Once you learn about them you won't be able to stop using them. I sure can't.

**2. Functions**

![image.png](attachment:image.png)

Functions aren't entirely new for you actually: you've already used them. type, for example, is a function that returns the type of a value. But what is a function? Simply put, a function is a piece of reusable code, aimed at solving a particular task. You can call functions instead of having to write code yourself. Maybe an example can clarify things here.

**3. Example**

![image-2.png](attachment:image-2.png)

Suppose you have the list containing only the heights of your family, fam: Say that you want to get the maximum value in this list. Instead of writing your own piece of Python code that goes through the list and finds the highest value, you can also use Python's max function. This is one of Python's built-in functions, just like type. We simply pass fam to max inside parentheses. The output makes sense: 1-point-89, the highest number in the list. max worked kind of like a black box here:

**4. Example**

![image-3.png](attachment:image-3.png)

you passed it a list, then the implementation of max, that you don't know, did its magic,

**5. Example**

![image-4.png](attachment:image-4.png)

and produced an output. How max actually did this, is not important to you, it just does what it's supposed to, and you didn't have to write your own code, which made your life easier.

**6. Example**

![image-5.png](attachment:image-5.png)

Of course, it's possible to also assign the result of a function call to a new variable, like here. Now tallest is just like any other variable; you can use it to continue your fancy calculations.

**7. round()**

![image-6.png](attachment:image-6.png)

Another one of these built-in functions is round. It takes two inputs: first, a number you want to round, and second, the precision with which to round, which is how many digits after the decimal point you want to keep. Say you want to round 1-point-68 to one decimal place. The first input is 1-point-68, the second input is 1. You separate the inputs with a comma. But there's more. It's perfectly possible to call the round function with only one input, like this. This time, Python figured out that you didn't specify the second input, and automatically chooses to round the number to the closest integer. To understand why both approaches work, let's open up the documentation. You can do this with yet another function, help, like this. It appears that round takes two inputs.

**8. round()**

![image-7.png](attachment:image-7.png)

In Python, these inputs, also called arguments, have names: number and ndigits. When you call the function round,

**9. round()**

![image-8.png](attachment:image-8.png)

with these two inputs, Python matches the inputs to the arguments:

**10. round()**

![image-9.png](attachment:image-9.png)

number is set to 1-point-68 and

**11. round()**

![image-10.png](attachment:image-10.png)

ndigits is set to 1. Next,

**12. round()**

![image-11.png](attachment:image-11.png)

The round function does its calculations with number and ndigits as if they are Python variables in a script. We don't know exactly what code Python executes. What is important, though, is that the function produces an output,

**13. round()**

![image-12.png](attachment:image-12.png)

namely the number 1-point-68 rounded to 1 decimal place.

**14. round()**

![image-13.png](attachment:image-13.png)

If you call the function round with only one input,

**15. round()**

![image-14.png](attachment:image-14.png)

Python again tries to

**16. round()**

![image-15.png](attachment:image-15.png)

match the inputs to

**17. round()**

![image-16.png](attachment:image-16.png)

the arguments. There's no input to match to the ndigits argument though. Luckily,

**18. round()**

![image-17.png](attachment:image-17.png)

the internal machinery of the round function knows how to handle this. When ndigits is not specified, the function simply rounds to the closest integer and

**19. round()**

![image-18.png](attachment:image-18.png)

returns that integer. That's why we got the number 2.

**20. round()**

![image-19.png](attachment:image-19.png)

In other words, ndigits is an optional argument. This tells us that you can call round in this form, as well as in this one.

**21. Find functions**

![image-20.png](attachment:image-20.png)

By now, you have an idea about how to use max and round, but how could you know that a function such as round exists in Python in the first place? Well, this is something you will learn with time. Whenever you are doing a rather standard task in Python, you can be pretty sure that there's already a function that can do this for you. In that case, you should definitely use it! Just do a quick internet search and you'll find the function you need with a nice usage example. And there is of course DataCamp, where you'll also learn about powerful functions and how to use them.

**22. Let's practice!**

Get straight to it in the interactive exercises, and I'll see you back here soon!

[Python List Functions & Methods Tutorial and Examples](https://www.datacamp.com/tutorial/python-list-function?utm_source=google&utm_medium=paid_search&utm_campaignid=19589720824&utm_adgroupid=157156376311&utm_device=c&utm_keyword=&utm_matchtype=&utm_network=g&utm_adpostion=&utm_creative=683184495095&utm_targetid=aud-438999696719:dsa-2218886984100&utm_loc_interest_ms=&utm_loc_physical_ms=1012780&utm_content=&utm_campaign=230119_1-sea~dsa~tofu_2-b2c_3-row-p2_4-prc_5-na_6-na_7-le_8-pdsh-go_9-na_10-na_11-na-dec23&gad_source=1&gclid=Cj0KCQiA4Y-sBhC6ARIsAGXF1g7hq8HYSFCxBhB21qYp4atIHEbyC2femfKU7IBJ-H5m7S2kIlzX_LAaAuyZEALw_wcB)

[Built-in Functions](https://docs.python.org/3/library/functions.html)

[Python Built in Functions](https://www.w3schools.com/python/python_ref_functions.asp)

### Exercise

**Familiar functions**

Out of the box, Python offers a bunch of built-in functions to make your life as a data scientist easier. You already know two such functions: print() and type(). You've also used the functions str(), int(), bool() and float() to switch between data types. These are built-in functions as well.

Calling a function is easy. To get the type of 3.0 and store the output as a new variable, result, you can use the following:

- result = type(3.0)

The general recipe for calling functions and saving the result to a variable is thus:

- output = function_name(input)

**Instructions**

- Use print() in combination with type() to print out the type of var1.
- Use len() to get the length of the list var1. Wrap it in a print() call to directly print it out.
- Use int() to convert var2 to an integer. Store the output as out2.

![image.png](attachment:image.png)

In [95]:
# Create variables var1 and var2
var1 = [1, 2, 3, 4]
var2 = True

# Print out type of var1
print(type(var1))

# Print out length of var1
print(len(var1))

# Convert var2 to an integer: out2
out2 = int(var2)
print(out2)

<class 'list'>
4
1


### Exercise

**Help!**

Maybe you already know the name of a Python function, but you still have to figure out how to use it. Ironically, you have to ask for information about a function with another function: help(). In IPython specifically, you can also use ? before the function name.

To get help on the max() function, for example, you can use one of these calls:

- help(max)
- ?max

Use the IPython Shell to open up the documentation on pow(). Which of the following statements is true?

**Instructions**

![image.png](attachment:image.png)

In [96]:
help(pow)

Help on built-in function pow in module builtins:

pow(base, exp, mod=None)
    Equivalent to base**exp with 2 arguments or base**exp % mod with 3 arguments
    
    Some types, such as ints, are able to use a more efficient algorithm when
    invoked using the three argument form.



### Exercise

**Multiple arguments**

In the previous exercise, you identified optional arguments by viewing the documentation with help(). You'll now apply this to change the behavior of the sorted() function.

Have a look at the documentation of sorted() by typing help(sorted) in the IPython Shell.

You'll see that sorted() takes three arguments: iterable, key, and reverse.

**key=None** means that if you don't specify the key argument, it will be None. **reverse=False** means that if you don't specify the reverse argument, it will be False, by default.

In this exercise, you'll only have to specify iterable and reverse, not key. The first input you pass to sorted() will be matched to the iterable argument, but what about the second input? To tell Python you want to specify reverse without changing anything about key, you can use = to assign it a new value:

- sorted(____, reverse=____)
Two lists have been created for you. Can you paste them together and sort them in descending order?

Note: For now, we can understand an [iterable](https://docs.python.org/3/glossary.html#term-iterable) as being any collection of objects, e.g., a List.

**Instructions**

- Use + to merge the contents of first and second into a new list: full.
- Call sorted() on full and specify the reverse argument to be True. Save the sorted list as full_sorted.
- Finish off by printing out full_sorted.

![image.png](attachment:image.png)

In [97]:
# Create lists first and second
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

# Paste together first and second: full
full = first + second

# Sort full in descending order: full_sorted
full_sorted = sorted(full, reverse=True)

# Print out full_sorted
print(full_sorted)

[20.0, 18.0, 11.25, 10.75, 9.5]


### 1. Methods

Built-in functions are only

**2. Built-in Functions**

![image.png](attachment:image.png)

one part of the Python story. You already know about functions such as max, to get the maximum of a list, len, to get the length of a list or a string, and so on. But what about other basic things, such getting the index of a specific element in the list, or reversing a list? You can look very hard for built-in functions that do this, but you won't find them.

**3. Back 2 Basics**

![image-2.png](attachment:image-2.png)

In the past exercises, you've already created a bunch of variables. Among other Python types, you've created strings, floats and lists, like the ones you see here. Each one of these values or data structures are so-called Python objects. This string is an object, this float is an object, but this list is also, you got it, an object. These objects have a specific type, that you already know:

**4. Back 2 Basics**

![image-3.png](attachment:image-3.png)

string, float, and list, and of course they represent the values you gave them, such as "liz", 1-point-73 and an entire list. But in addition to this, Python objects also come with a bunch of so-called "methods". You can think of methods as functions that "belong to" Python objects. A Python object of type string has methods,

**5. Back 2 Basics**

![image-4.png](attachment:image-4.png)

such as capitalize and replace, but also objects of type float and list have specific methods depending on the type. Enough for the theory now; let's try to use a method!

**6. list methods**

![image-5.png](attachment:image-5.png)

Suppose you want to get the index of the string "mom" in the fam list. fam is an Python object with the type list, and has a method named index. To call the method, you use the dot notation, like this. The only input is the string "mom", the element you want to get the index for. Python returns 4, which indeed is the index of the string "mom". I called the index method "on" the fam list here, and the output was 4. Similarly, I can use the count method on the fam list to count the number of times 1-point-73 occurs in the list. Python gives me 1, which makes sense, because only liz is 1-point-73 meters tall. But lists are not the only Python objects that have methods associated. Also floats, integers, booleans and strings

[Python - List Methods](https://www.w3schools.com/python/python_lists_methods.asp)

**7. str methods**

![image-6.png](attachment:image-6.png)

are Python objects that have specific methods associated with them. Take the variable sister for example, that represents a string. You can call the method capitalize on sister, without any inputs. It returns a string where the first letter is capitalized now. Or what if you want to replace some parts of the string with other parts? Not a problem. Just call the method replace on sister, with two appropriate inputs. In the output, "z" is replaced with "sa".

[Python String Methods](https://www.w3schools.com/python/python_ref_string.asp)

**8. Methods**

![image-7.png](attachment:image-7.png)

To be absolutely clear: in Python, everything is an object, and each object has specific methods associated. Depending on the type of the object, list, string, float, whatever, the available methods are different. A string object like sister has a replace method, but a list like fam doesn't have this, as you can see from this error.

**9. Methods**

![image-8.png](attachment:image-8.png)

Objects of different types can have methods with the same name: Take the index method. It's available for both strings and lists. If you call it on a string, you get the index of the letters in the string; If you call it on a list, you get the index of the element in the list. This means that, depending on the type of the object, the methods behave differently. Before I unleash you on some exercises on methods,

**10. Methods (2)**

![image-9.png](attachment:image-9.png)

there's one more thing I want to tell you. Some methods can change the objects they are called on. Let's retake the fam list, and call the append method on it. As the input, we pass a string we want to add to the list. Python doesn't generate an output, but if we check the fam list again, we see that it has been extended with the string "me". Let's do this again, this time to add my height to the list. Again, the fam list was extended. This is pretty cool, because you can write very concise code to update your data structures on the fly, but it can also be pretty dangerous. Some method calls don't change the object they're called on, while others do, so watch out.

**11. Summary**

![image-10.png](attachment:image-10.png)

Let's take a step back here and summarize this. you have Python functions, like type, max and round, that you can call like this. There's also methods, which are functions that are specific to Python objects. Depending on the type of the Python object you're dealing with, you'll be able to use different methods and they behave differently. You can call methods on the objects with the dot notation, like this, for example. There's much more to tell about Python objects, methods and how Python works internally,

**12. Let's practice!**

but for now, let's stick to what I've talked about here. It's time to get some exercises and add methods to your evergrowing skillset!

### Exercise

**String Methods**

Strings come with a bunch of methods. Follow the instructions closely to discover some of them. If you want to discover them in more detail, you can always type help(str) in the IPython Shell.

A string place has already been created for you to experiment with.

**Instructions**

- Use the upper() method on place and store the result in place_up. Use the syntax for calling methods that you learned in the previous video.
- Print out place and place_up. Did both change?
- Print out the number of o's on the variable place by calling count() on place and passing the letter 'o' as an input to the method. We're talking about the variable place, not the word "place"!

![image.png](attachment:image.png)

In [98]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

In [99]:
# string to experiment with: place
place = "poolhouse"

# Use upper() on place: place_up
place_up = place.upper()

# Print out place and place_up
print(place)
print(place_up)

# Print out the number of o's in place
print(place.count("o"))

poolhouse
POOLHOUSE
3


### Exercise

**List Methods**

Strings are not the only Python types that have methods associated with them. Lists, floats, integers and booleans are also types that come packaged with a bunch of useful methods. In this exercise, you'll be experimenting with:

- index(), to get the index of the first element of a list that matches its input and
- count(), to get the number of times an element appears in a list.

You'll be working on the list with the area of different parts of a house: areas.

**Instructions**

- Use the index() method to get the index of the element in areas that is equal to 20.0. Print out this index.
- Call count() on areas to find out how many times 9.50 appears in the list. Again, simply print out this number.

![image.png](attachment:image.png)

In [100]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Print out the index of the element 20.0
print(areas.index(20.0))

# Print out how often 9.50 appears in areas
print(areas.count(9.50))

2
1


### Exercise

**List Methods (2)**

Most list methods will change the list they're called on. Examples are:

- append(), that adds an element to the list it is called on,
- remove(), that removes the first element of a list that matches the input, and
- reverse(), that reverses the order of the elements in the list it is called on.

You'll be working on the list with the area of different parts of the house: areas.

**Instructions**

- Use append() twice to add the size of the poolhouse and the garage again: 24.5 and 15.45, respectively. Make sure to add them in this order.
- Print out areas
- Use the reverse() method to reverse the order of the elements in areas.
- Print out areas once more.

![image.png](attachment:image.png)

In [101]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Use append twice to add poolhouse and garage size
areas.append(24.5)
areas.append(15.45)


# Print out areas
print(areas)

# Reverse the orders of the elements in areas
areas.reverse()

# Print out areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5, 24.5, 15.45]
[15.45, 24.5, 9.5, 10.75, 20.0, 18.0, 11.25]


### 1. Packages

By now, I hope you're convinced

**2. Motivation**

![image.png](attachment:image.png)

that python functions and methods are extremely powerful: you can basically use other people's code to solve your own problems. That's amazing! However, adding all functions and methods that have been written up to now to the same Python distribution would be a mess. There would be tons and tons of code in there, that you'll never use. Also, maintaining all of this code would be a real pain.

**3. Packages**

![image-2.png](attachment:image-2.png)

This is where packages come into play. You can think of packages as a directory of Python scripts. Each such script is a so-called module. These modules specify functions, methods and new Python types aimed at solving particular problems. There are thousands of Python packages available from the internet. Among them are packages for data science: there's NumPy to efficiently work with arrays, Matplotlib for data visualization, and scikit-learn for machine learning. Not all these packages are available in Python by default.

**4. Install package**

![image-3.png](attachment:image-3.png)

To use Python packages, you'll first have to install them on your own system, and then put code in your script to tell Python that you want to use these packages. Datacamp already has all necessary packages installed for you, but if you want to install them on your own system, you'll want to use pip, a package maintenance system for Python. If you go to this URL, you can download the file get-pip-dot-py. Next, you go to the terminal, and execute python3 get-pip-dot-py. Now you can use pip to actually install a Python package of your choosing. Suppose we want to install the numpy package, which you'll learn about in the next chapter. You type pip3 install numpy. You have to use the commands python3 and pip3 here to tell our system that we're working with Python version 3. Now that the package is installed, you can actually start using it in one of your Python scripts.

[pip documentation v23.3.2 Installation](https://pip.pypa.io/en/stable/installation/)

**5. Import package**

![image-4.png](attachment:image-4.png)

Before you can do this, you should import the package, or a specific module of the package. You can do this with the import statement. To import the entire numpy package, you can do import numpy, like this. A commonly used function in NumPy is array. It takes a list as input. Simply calling the array function like this, will generate an error. To refer to the array function from the numpy package, you'll need this. This time it works. The NumPy array is very useful to do data science, but more on that later. Using this numpy dot prefix all the time can become pretty tiring, so you can also import the package and refer to it with a different name. You can do this by extending your import statement with as, like this. Now, instead of numpy-dot-array, you'll have to use np-dot-array to use NumPy's array function. There are cases in which you only need one specific function of a package. Python allows you to make this explicit in your code. Suppose that we only want to use the array function from the NumPy package. Instead of doing import numpy, you can instead do from numpy import array, like this. This time, you can simply call the array function like this, no need to use numpy dot here. This from import version to use specific parts of a package can be useful to limit the amount of coding, but you're also loosing some of the context.

**6. from numpy import array**

![image-5.png](attachment:image-5.png)

Suppose you're working in a long Python script. You import the array function from numpy at the very top, and way later, you actually use this array function. Somebody else who's reading your code might have forgotten that this array function is a specific NumPy function; it's not clear from the function call.

**7. import numpy**

![image-6.png](attachment:image-6.png)

In that respect, the more standard import numpy call is preferred: In this case, your function call is numpy-dot-array, making it very clear that you're working with NumPy.

**8. Let's practice!**

Off to the exercises now, where you can practice different ways of importing packages and modules yourself. You're well on your way to becoming a pythonista data science ninja.

### Exercise

**Import package**

As a data scientist, some notions of geometry never hurt. Let's refresh some of the basics.

For a fancy clustering algorithm, you want to find the circumference, *C*, and area, *A*, of a circle. When the radius of the circle is r, you can calculate *C* and *A* as:

![image.png](attachment:image.png)

In Python, the symbol for exponentiation is \*\*. This operator raises the number to its left to the power of the number to its right. For example 3\*\*4 is 3 to the power of 4 and will give 81.

To use the constant pi, you'll need the math package. A variable r is already coded in the script. Fill in the code to calculate C and A and see how the [print()](https://docs.python.org/3/library/functions.html#print) functions create some nice printouts.

**Instructions**

- Import the math package. Now you can access the constant pi with math.pi.
- Calculate the circumference of the circle and store it in C.
- Calculate the area of the circle and store it in A.

![image-2.png](attachment:image-2.png)

In [102]:
# Import the math package
import math

# Definition of radius
r = 0.43

# Calculate C
C = 2*math.pi*r

# Calculate A
A = math.pi*r**2

# Build printout
print("Circumference: " + str(C))
print("Area: " + str(A))

Circumference: 2.701769682087222
Area: 0.5808804816487527


### Exercise

**Selective import**

General imports, like "import math", make all functionality from the math package available to you. However, if you decide to only use a specific part of a package, you can always make your import more selective:

- from math import pi

Let's say the Moon's orbit around planet Earth is a perfect circle, with a radius r (in km) that is defined in the script.

**Instructions**

- Perform a selective import from the math package where you only import the radians function.
- Calculate the distance travelled by the Moon over 12 degrees of its orbit. Assign the result to dist. You can calculate this as r * phi, where r is the radius and phi is the angle in radians. To convert an angle in degrees to an angle in radians, use the radians() function, which you just imported.
- Print out dist.

![image.png](attachment:image.png)

In [103]:
# Import radians function of math package
from math import radians

# Definition of radius
r = 192500

# Travel distance of Moon over 12 degrees. Store in dist.
phi = radians(12)
dist = r * phi

# Print out dist
print(dist)

40317.10572106901


### Exercise

**Different ways of importing**

There are several ways to import packages and modules into Python. Depending on the import call, you'll have to use different Python code.

Suppose you want to use the function [inv()](https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.inv.html), which is in the linalg subpackage of the scipy package. You want to be able to use this function as follows:

- my_inv([[1,2], [3,4]])

Which import statement will you need in order to run the above code without an error?

**ınstructions**

![image.png](attachment:image.png)

## <a id="4"></a>
<font color="lightseagreen" size=+2.5><b>4. Numpy</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

NumPy is a fundamental Python package to efficiently practice data science. Learn to work with powerful tools in the NumPy array, and get started with data exploration.

### 1. NumPy

Wow, you've done well and by now, you are aware

**2. Lists Recap**

![image.png](attachment:image.png)

that the Python list is pretty powerful. A list can hold any type and can hold different types at the same time. You can also change, add and remove elements. This is wonderful, but one feature is missing, a feature that is super important for aspiring data scientists as yourself. When analyzing data, you'll often want to carry out operations over entire collections of values, and you want to do this fast. With lists, this is a problem.

**3. Illustration**

![image-2.png](attachment:image-2.png)

Let's retake the heights of your family and yourself. Suppose you've also asked for everybody's weight. It's not very polite, but everything for science, right? You end up with two lists, height, and weight. The first person is 1-point-73 meters tall and weighs 65-point-4 kilograms. If you now want to calculate the Body Mass Index for each family member, you'd hope that this call can work, making the calculations element-wise. Unfortunately, Python throws an error, because it has no idea how to do calculations on lists. You could solve this by going through each list element one after the other, and calculating the BMI for each person separately, but this is terribly inefficient and tiresome to write.

**4. Solution: NumPy**

![image-3.png](attachment:image-3.png)

A way more elegant solution is to use NumPy, or Numeric Python. It's a Python package that, among others, provides a alternative to the regular python list: the NumPy array. The NumPy array is pretty similar to the list, but has one additional feature: you can perform calculations over entire arrays. It's really easy, and super-fast as well. The NumPy package is already installed on DataCamp's servers, but if you want to work with it on your own system, go to the command line and execute pip3 install numpy. Next,

**5. NumPy**

![image-4.png](attachment:image-4.png)

to actually use NumPy in your Python session, you can import the numpy package, like this. Let's start with creating a numpy array. You do this with NumPy's array function: the input is a regular Python list. I'm using array twice here, to create NumPy versions of the height and weight lists from before: np_height and np_weight: Let's try to calculate everybody's BMI with a single call again. This time, it worked fine: the calculations were performed element-wise. The first person's BMI was calculated by dividing the first element in np_weight by the square of the first element in np_height, the second person's BMI was calculated with the second height and weight elements, and so on.

**6. Comparison**

![image-5.png](attachment:image-5.png)

Let's do a quick comparison here. First, we tried to do calculations with regular lists, like this, but this gave us an error, because Python doesn't now how to do calculations with lists like we want them to. Next, these regular lists where converted to NumPy arrays. The same operations now work without any problem: NumPy knows how to work with arrays as if they are single values, which is pretty awesome if you ask me.

**7. NumPy: remarks**

![image-6.png](attachment:image-6.png)

You should still pay attention, though. First of all, NumPy can do all of this so easily because it assumes that your NumPy array can only contain values of a single type. It's either an array of floats, either an array of booleans, and so on. If you do try to create an array with different types, like this for example, the resulting NumPy array will contain a single type, string in this case. The boolean and the float were both converted to strings. Second, you should know that a NumPy array is simply a new kind of Python type, like the float, string and list types from before. This means that it comes with its own methods, which can behave differently than you'd expect.

**8. NumPy: remarks**

![image-7.png](attachment:image-7.png)

Take this Python list and this numpy array, for example. If you do python_list + python_list, the list elements are pasted together, generating a list with 6 elements. If you do this with the numpy arrays, on the other hand, Python will do an element-wise sum of the arrays. Just make sure to pay attention when you're juggling around with different Python types, because the outcomes can differ a lot! Apart from these subtleties,

**9. NumPy Subsetting**

![image-8.png](attachment:image-8.png)

you can work with NumPy arrays pretty much the same as you can with regular Python lists. When you want to get elements from your array, for example, you can use square brackets. Suppose you want to get the bmi for the second person, so at index 1. This will do the trick. Specifically for NumPy, there's also another way to do list subsetting: using an array of booleans. Say you want to get all BMI values in the bmi array that are over 23. A first step is using the greater than sign, like this: The result is a NumPy array containing booleans: True if the corresponding bmi is above 23, False if it's below. Next, you can use this boolean array inside square brackets to do subsetting. Only the elements in bmi that are above 23, so for which the corresponding boolean value is True, is selected. There's only one BMI that's above 23, so we end up with a NumPy array with a single value, that specific BMI. Using the result of a comparison to make a selection of your data is a very common way to get surprising insights.

**10. Let's practice!**

Learn all about it and the other NumPy basics in the exercises!

### Exercise

**Your First NumPy Array**

In this chapter, we're going to dive into the world of baseball. Along the way, you'll get comfortable with the basics of numpy, a powerful package to do data science.

A list baseball has already been defined in the Python script, representing the height of some baseball players in centimeters. Can you add some code here and there to create a numpy array from it?

**Instructions**

- Import the numpy package as np, so that you can refer to numpy with np.
- Use np.array() to create a numpy array from baseball. Name this array np_baseball.
- Print out the type of np_baseball to check that you got it right.

![image.png](attachment:image.png)

In [104]:
# Import the numpy package as np
import numpy as np

# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))

<class 'numpy.ndarray'>


### Exercise

**Baseball players' height**

You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: height_in. The height is expressed in inches. Can you make a numpy array out of it and convert the units to meters?

height_in is already available and the numpy package is loaded, so you can start straight away (Source: stat.ucla.edu).

**Instructions**

- Create a numpy array from height_in. Name this new array np_height_in.
- Print np_height_in.
- Multiply np_height_in with 0.0254 to convert all height measurements from inches to meters. Store the new values in a new array, np_height_m.
- Print out np_height_m and check if the output makes sense.

![image.png](attachment:image.png)

In [106]:
print(df_baseball.columns)
print(df_baseball.shape)

Index(['Name', 'Team', 'Position', 'Height', 'Weight', 'Age', 'PosCategory'], dtype='object')
(1015, 7)


In [107]:
height_in = df_baseball.Height.to_list()

In [108]:
# Import numpy
import numpy as np

# Create a numpy array from height_in: np_height_in
np_height_in = np.array(height_in)

# Print out np_height_in
print(np_height_in)

# Convert np_height_in to m: np_height_m
np_height_m = np_height_in * 0.0254

# Print np_height_m
print(np_height_m)

[74 74 72 ... 75 75 73]
[1.8796 1.8796 1.8288 ... 1.905  1.905  1.8542]


### Exercise

**Baseball player's BMI**

The MLB also offers to let you analyze their weight data. Again, both are available as regular Python lists: height_in and weight_lb. height_in is in inches and weight_lb is in pounds.

It's now possible to calculate the BMI of each baseball player. Python code to convert height_in to a numpy array with the correct units is already available in the workspace. Follow the instructions step by step and finish the game! height_in and weight_lb are available as regular lists.

**Instructions**

- Create a numpy array from the weight_lb list with the correct units. Multiply by 0.453592 to go from pounds to kilograms. Store the resulting numpy array as np_weight_kg.
- Use np_height_m and np_weight_kg to calculate the BMI of each player. Use the following equation:
![image.png](attachment:image.png)
- Save the resulting numpy array as bmi.
- Print out bmi.

![image-2.png](attachment:image-2.png)

In [109]:
weight_lb = df_baseball.Weight.to_list()

In [111]:
# Import numpy
import numpy as np

# Create array from height_in with metric units: np_height_m
np_height_m = np.array(height_in) * 0.0254

# Create array from weight_lb with metric units: np_weight_kg
np_weight_kg = np.array(weight_lb) * 0.453592

# Calculate the BMI: bmi
bmi = np_weight_kg / (np_height_m ** 2)

# Print out bmi
print(bmi)

[23.11037639 27.60406069 28.48080465 ... 25.62295933 23.74810865
 25.72686361]


### Exercise

**Lightweight baseball players**

To subset both regular Python lists and numpy arrays, you can use square brackets:

![image.png](attachment:image.png)

For numpy specifically, you can also use boolean numpy arrays:

![image-2.png](attachment:image-2.png)

The code that calculates the BMI of all baseball players is already included. Follow the instructions and reveal interesting things from the data! height_in and weight_lb are available as regular lists.

**Instructions**

- Create a boolean numpy array: the element of the array should be True if the corresponding baseball player's BMI is below 21. - You can use the < operator for this. Name the array light.
- Print the array light.
- Print out a numpy array with the BMIs of all baseball players whose BMI is below 21. Use light inside square brackets to do a selection on the bmi array.

![image-3.png](attachment:image-3.png)

In [112]:
# Import numpy
import numpy as np

# Calculate the BMI: bmi
np_height_m = np.array(height_in) * 0.0254
np_weight_kg = np.array(weight_lb) * 0.453592
bmi = np_weight_kg / np_height_m ** 2

# Create the light array
light = bmi < 21

# Print out light
print(light)

# Print out BMIs of all baseball players whose BMI is below 21
print(bmi[light])

[False False False ... False False False]
[20.54255679 20.54255679 20.69282047 20.69282047 20.34343189 20.34343189
 20.69282047 20.15883472 19.4984471  20.69282047 20.9205219 ]


### Exercise

**NumPy Side Effects**

As Hugo explained before, numpy is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed.

First of all, numpy arrays cannot contain elements with different types. If you try to build such a list, some of the elements' types are changed to end up with a homogeneous list. This is known as type coercion.

Second, the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and numpy arrays.

Have a look at this line of code:

- np.array([True, 1, 2]) + np.array([3, 4, False])

Can you tell which code chunk builds the exact same Python object? The numpy package is already imported as np, so you can start experimenting in the IPython Shell straight away!

**Instructions**

![image.png](attachment:image.png)

In [113]:
np.array([True, 1, 2]) + np.array([3, 4, False])

array([4, 5, 2])

### Exercise

**Subsetting NumPy Arrays**

You've seen it with your own eyes: Python lists and numpy arrays sometimes behave differently. Luckily, there are still certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same. To see this for yourself, try the following lines of code in the IPython Shell:

![image.png](attachment:image.png)

The script in the editor already contains code that imports numpy as np, and stores both the height and weight of the MLB players as numpy arrays. height_in and weight_lb are available as regular lists.

**Instructions**

- Subset np_weight_lb by printing out the element at index 50.
- Print out a sub-array of np_height_in that contains the elements at index 100 up to and including index 110.

![image-2.png](attachment:image-2.png)

In [114]:
# Import numpy
import numpy as np

# Store weight and height lists as numpy arrays
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)

# Print out the weight at index 50
print(np_weight_lb[50])

# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])

200
[73 74 72 73 69 72 73 75 75 73 72]
