# Python Lists

Imagine you're organizing a family picnic and need to keep track of everyone's contributions. Instead of jotting down each item separately, you can use a Python list to group them together. Here's how you might do it:

Items: sandwiches, drinks, fruits, games
Contributors: John, Sarah, Mom, Dad
You can create a list of lists to pair each item with its contributor:

## Create a list
A list is a compound data type; you can group values together, like this:



In [None]:
a = "is"
b = "nice"
my_list = ["my", "list", a, b]


After measuring the height of your family, you decide to collect some information on the house you're living in. The areas of the different parts of your house are stored in separate variables in the exercise.

#### Instructions

>Create a list, areas, that contains the area of the hallway (hall), kitchen (kit), living room (liv), bedroom (bed) and bathroom (bath), in this order. Use the predefined variables.

>Print areas with the print() function.



In [None]:
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Create list areas
areas=[
    hall,
    kit,
    liv,
    bed,
    bath,
    ]

# Print areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5]


### Create lists with different types
Although it's not really common, a list can also contain a mix of Python types including strings, floats, and booleans.

You're now going to add the room names to your list, so you can easily see both the room name and size together.


#### Instructions

>Finish the code that creates the areas list.

>Build the list so that the list first contains the name of each room as a string and then its area. 

>In other words, add the strings "hallway", "kitchen" and "bedroom" at the appropriate locations.

>Print areas again; is the printout more informative this time?

In [None]:
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Adapt list areas
areas = ["hallway", hall,"kitchen", kit, "living room", liv,"bedroom", bed, "bathroom", bath]

# Print areas
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]



Some of the code has been provided for you to get you started. Pay attention here! "bathroom" is a string, while bath is a variable that represents the float 9.50 you specified earlier.

### List of lists
As a data scientist, you'll often be dealing with a lot of data, and it will make sense to group some of this data.

Instead of creating a list containing strings and floats, representing the names and areas of the rooms in your house, you can create a list of lists.


#### Instructions

>Finish the list of lists so that it also contains the bedroom and bathroom data. 

>Make sure you enter these in order!

>Print out house; does this way of structuring your data make more sense?

In [None]:
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# House information as list of lists
house = [["hallway", hall],
         ["kitchen", kit],
         ["living room", liv],
         ["bedroom", bed],
         ["bathroom", bath],
       ]

# Print out house
print(house)


Remember: "hallway" is a string, while hall is a variable that represents the float 11.25 you specified earlier.

## Subsetting List

In this lesson, you'll learn how to access and manipulate elements within a Python list using indexing and slicing. Here's what you'll explore:

>Indexing: Access elements using positive and negative indexes. For example, fam[3] retrieves the fourth element, while fam[-1] gets the last element.

>Slicing: Select multiple elements to create a new list. Use a colon to specify the range, like fam[1:4] for elements with indexes 1, 2, and 3.


### Subset and conquer

Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list x and then selects "b" from it. Remember that this is the second element, so it has index 1. You can also use negative indexing.


In [None]:

x = ["a", "b", "c", "d"]
x[1]
x[-3] # same result!


'b'

Remember the areas list from before, containing both strings and floats? Its definition is already in the script. Can you add the correct code to do some Python subsetting?

#### Instructions

>Print out the second element from the areas list (it has the value 11.25).

>Subset and print out the last element of areas, being 9.50. Using a negative index makes sense here!

>Select the number representing the area of the living room (20.0) and print it out.

In [None]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
print(areas[4:6])

11.25
9.5
['living room', 20.0]


### Slicing and dicing

Selecting single values from a list is just one part of the story. It's also possible to slice your list, which means selecting multiple elements from your list. Use the following syntax:


In [None]:
my_list[start:end]


The start index will be included, while the end index is not. However, it's also possible not to specify these indexes. If you don't specify the start index, Python figures out that you want to start your slice at the beginning of your list.

#### Instructions

>Use slicing to create a list, downstairs, that contains the first 6 elements of areas.

>Create upstairs, as the last 4 elements of areas. This time, simplify the slicing by omitting the end index.

>Print both downstairs and upstairs using print().

In [None]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Use slicing to create downstairs
downstairs = areas[:6]

# Use slicing to create upstairs
upstairs = areas[6:10]

# Print out downstairs and upstairs
print(downstairs)
print(upstairs)


['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0]
['bedroom', 10.75, 'bathroom', 9.5]


### Subsetting lists of lists
A Python list can also contain other lists.

To subset lists of lists, you can use the same technique as before: square brackets. This would look something like this for a list, house:


In [None]:

house[2][0]

#### Instructions

>Subset the house list to get the float 9.5.

In [None]:
house = [["hallway", 11.25],
         ["kitchen", 18.0],
         ["living room", 20.0],
         ["bedroom", 10.75],
         ["bathroom", 9.50]]

# Subset the house list
house[4][1]

9.5

## Manipulating List

Dive into the world of Python list manipulation. This lesson covers how to modify, add, and remove elements in a list, and explores the nuances of list references in memory.



>Update list elements using square brackets and the equals sign.

>Add elements by using the plus operator to concatenate lists.

>Remove elements with the del keyword.

Understand how lists are stored in memory and how references work.



In [None]:
#  Here's a quick code snippet to illustrate:

fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
fam[7] = 1.86  # Update dad's height
del(fam[2])    # Remove "emma"

# Explore these concepts further in the exercises.

### Replace list elements

To replace list elements, you subset the list and assign new values to the subset. You can select single elements or you can change entire list slices at once.

For this and the following exercises, you'll continue working on the areas list that contains the names and areas of different rooms in a house.



#### Instructions

>Update the area of the bathroom to be 10.50 square meters instead of 9.50 using negative indexing.

>Make the areas list more trendy! Change "living room" to "chill zone". Don't use negative indexing this time.

In [None]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Correct the bathroom area
areas[9]=10.50

# Change "living room" to "chill zone"
areas[4]="chill zone"

print(areas[9])
print(areas[4])

10.5
chill zone


### Extend a list
If you can change elements in a list, you sure want to be able to add elements to it, right? You can use the + operator:


In [None]:
x = ["a", "b", "c", "d"]
y = x + ["e", "f"]
# You just won the lottery, awesome! You decide to build a poolhouse and a garage. Can you add the information to the areas list?


#### Instructions

>Use the + operator to paste the list ["poolhouse", 24.5] to the end of the areas list.

>Store the resulting list as areas_1.

>Further extend areas_1 by adding data on your garage. 

>Add the string "garage" and float 15.45. Name the resulting list areas_2.

In [None]:
# Create the areas list and make some changes
areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0,
         "bedroom", 10.75, "bathroom", 10.50]

# Add poolhouse data to areas, new list is areas_1
areas_1 = areas + ["poolhouse", 24.5]

# Add garage data to areas_1, new list is areas_2
areas_2 = areas_1 + ["garage",15.45]

print(areas)
print(areas_1)
print(areas_2)

['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5]
['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'poolhouse', 24.5]
['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'poolhouse', 24.5, 'garage', 15.45]


### Delete list elements
Finally, you can also remove elements from your list. You can do this with the del statement:


In [None]:

x = ["a", "b", "c", "d"]
del x[1]
#Pay attention here: as soon as you remove an element from a list, the indexes of the elements that come after the deleted element all change!



Unfortunately, the amount you won with the lottery is not that big after all and it looks like the poolhouse isn't going to happen. You'll need to remove it from the list. You decide to remove the corresponding string and float from the areas list.


#### Instructions

>Delete the string and float for the "poolhouse" from your areas list.

>Print the updated areas list.

In [None]:
areas = ["hallway", 11.25, "kitchen", 18.0,
        "chill zone", 20.0, "bedroom", 10.75,
         "bathroom", 10.50, "poolhouse", 24.5,
         "garage", 15.45]

print(areas)
# Delete the poolhouse items from the list
del areas[10:12]

# Print the updated list
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'poolhouse', 24.5, 'garage', 15.45]
['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'garage', 15.45]


### Inner workings of lists
Some code has been provided for you in this exercise: a list with the name areas and a copy named areas_copy.

Currently, the first element in the areas_copy list is changed and the areas list is printed out. If you hit the run code button you'll see that, although you've changed areas_copy, the change also takes effect in the areas list. That's because areas and areas_copy point to the same list.

If you want to prevent changes in areas_copy from also taking effect in areas, you'll have to do a more explicit copy of the areas list with list() or by using [:].



#### Instructions

>Change the second command, that creates the variable areas_copy, such that areas_copy is an explicit copy of areas. 

>After your edit, changes made to areas_copy shouldn't affect areas. 

>Submit the answer to check this.

In [None]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Change this command
areas_copy = areas[:]
print(areas_copy)

# Change areas_copy
areas_copy[0] = 5.0

# Print areas
print(areas)
print(areas_copy)

# Function

This lesson introduces the concept of functions in Python, emphasizing their role in simplifying code by reusing predefined operations.


In [None]:
abs()
aiter()
all()
anext()
any()
ascii()
bin()
bool()
breakpoint()
bytearray()
bytes()
callable()
chr()
classmethod()
compile()
complex()
delattr()
dict()
dir()
divmod()
enumerate()
eval()
exec()
filter()
float()
format()
frozenset()
getattr()
globals()
hasattr()
hash()
help()
hex()
id()
input()
int()
isinstance()
issubclass()
iter()
len()
list()
locals()
map()
max()
memoryview()
min()
next()
object()
oct()
open()
ord()
pow()
print()
property()
range()
repr()
reversed()
round()
set()
setattr()
slice()
sorted()
staticmethod()
str()
sum()
super()
tuple()
type()
vars()
zip()
__import__()


Key Points:



>Functions are reusable code blocks for specific tasks.

>Built-in functions like type, max, and round make coding more efficient.

>max finds the highest value in a list.

>round can round numbers to a specified decimal place or nearest integer.

>Functions can have optional arguments, as seen with round.


In [None]:

#Code Example:

fam = [1.89, 1.76, 1.68]
tallest = max(fam)
rounded_value = round(1.68, 1)

#Explore Python's documentation and online resources to discover more functions.

#### Real-Life Example
Imagine you're organizing a family picnic and you need to decide on the tallest person to help hang decorations. Instead of measuring everyone, you can use Python's max function to find the tallest height from a list of family members' heights.



Create a list of heights: 

fam = [1.75, 1.68, 1.82, 1.89]

Use max to find the tallest: tallest = max(fam)


In [None]:
fam = [1.75, 1.68, 1.82, 1.89]
tallest = max(fam)
print(tallest)  # Output: 1.89


1.89


## Familiar functions
Out of the box, Python offers a bunch of built-in functions to make your life as a data scientist easier. You already know two such functions: print() and type(). There are also functions like str(), int(), bool() and float() to switch between data types. You can find out about them here. These are built-in functions as well.



Calling a function is easy. To get the type of 3.0 and store the output as a new variable, result, you can use the following:


In [None]:

result = type(3.0)

print(result)  # Output: <class 'float'>


<class 'float'>


Instructions

>Use print() in combination with type() to print out the type of var1.

>Use len() to get the length of the list var1. Wrap it in a print() call to directly print it out.

>Use int() to convert var2 to an integer. Store the output as out2.

In [None]:
# Create variables var1 and var2
var1 = [1, 2, 3, 4]
var2 = True

# Print out type of var1
print(var1)

# Print out length of var1
print(len(var1))

# Convert var2 to an integer: out2
out2 = int(var2)
print(out2)
print(type(var1))
print(type(var2))
print(type(out2))



## Help!
Maybe you already know the name of a Python function, but you still have to figure out how to use it. Ironically, you have to ask for information about a function with another function: help(). In IPython specifically, you can also use ? before the function name.



To get help on the max() function, for example, you can use one of these calls:


In [None]:

help(max)
?max


Use the IPython Shell to open up the documentation on pow(). Do this by typing ?pow or help(pow) and hitting Enter.

Which of the following statements is true? (find on terminal)

Possible answers


>pow() takes three arguments: base, exp, and mod. Without mod, the function will return an error.

>pow() takes three required arguments: base, exp, and None.

>pow() requires base and exp arguments; mod is optional. <<=====

>pow() takes two arguments: exp and mod. Missing exp results in an error.

This saves time and ensures accuracy, just like how functions simplify tasks in coding.

## Multiple arguments
In the previous exercise, you identified optional arguments by viewing the documentation with help(). You'll now apply this to change the behavior of the sorted() function.

Have a look at the documentation of sorted() by typing help(sorted) in the IPython Shell.

You'll see that sorted() takes three arguments: iterable, key, and reverse. In this exercise, you'll only have to specify iterable and reverse, not key.

Two lists have been created for you.

Can you paste them together and sort them in descending order?


Instructions

>Use + to merge the contents of first and second into a new list: full.

>Call sorted() and on full and specify the reverse argument to be True. Save the sorted list as full_sorted.

>Finish off by printing out full_sorted.






In [None]:

angka = [1, 2, 4, 3, 2, 5]
number= angka[:] +[10, 9, 8, 7, 6]
angkaurut=sorted(angka, reverse=True)
numberurut=sorted(number, reverse=False)

print(angka)
print(number)
print(angkaurut)
print(numberurut)

[1, 2, 4, 3, 2, 5]
[1, 2, 4, 3, 2, 5, 10, 9, 8, 7, 6]
[5, 4, 3, 2, 2, 1]
[1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [None]:
# Create lists first and second
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

# Paste together first and second: full
full = first[:] + second[:]

# Sort full in descending order: full_sorted
full_sorted= sorted(full, reverse=True)

# Print out full_sorted
print(full_sorted)

[20.0, 18.0, 11.25, 10.75, 9.5]


## String Methods
Strings come with a bunch of methods. Follow the instructions closely to discover some of them. If you want to discover them in more detail, you can always type help(str) in the IPython Shell.

A string place has already been created for you to experiment with.



Instructions

>Use the .upper() method on place and store the result in place_up. 

>Use the syntax for calling methods that you learned in the previous video.

>Print out place and place_up. Did both change?

>Print out the number of o's on the variable place by calling .count() on place and passing the letter 'o' as an input to the method. 

>We're talking about the variable place, not the word "place"!

In [None]:
# string to experiment with: place
place = "poolhouse"

# Use upper() on place
place_up = str.upper(place)

# Print out place and place_up
print(place)
print(place_up)


# Print out the number of o's in place
print(place.count("o"))


poolhouse
POOLHOUSE


3

## List Methods
Strings are not the only Python types that have methods associated with them. Lists, floats, integers and booleans are also types that come packaged with a bunch of useful methods. In this exercise, you'll be experimenting with:

>.index(), to get the index of the first element of a list that matches its input and

>.count(), to get the number of times an element appears in a list.

You'll be working on the list with the area of different parts of a house: areas.



Instructions

>Use the .index() method to get the index of the element in areas that is equal to 20.0. 

>Print out this index.

>Call .count() on areas to find out how many times 9.50 appears in the list. 

>Again, simply print out this number.

In [None]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Print out the index of the element 20.0
print(areas.index(20.0))

# Print out how often 9.50 appears in areas
print(areas.count(20.0))


## List Methods
Most list methods will change the list they're called on. Examples are:

>.append(), that adds an element to the list it is called on,

>.remove(), that removes the first element of a list that matches the input, and

>.reverse(), that reverses the order of the elements in the list it is called on.

You'll be working on the list with the area of different parts of the house: areas.



#### Instructions

>Use .append() twice to add the size of the poolhouse and the garage again: 24.5 and 15.45, respectively. 

>Make sure to add them in this order.

>Print out areas

>Use the .reverse() method to reverse the order of the elements in areas.
Print out areas once more.

In [None]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Use append twice to add poolhouse and garage size
poolhouse=24.5
garage=15.45
areas.append(poolhouse)
areas.append(garage)

# Print out areas
print(areas)

# Reverse the orders of the elements in areas
areas=sorted(areas)
areas.reverse()
# Print out areas
print(areas)

# Packages

#### Real-Life Example
Imagine you're planning a road trip with friends. You have a car (Python), but to make the journey enjoyable, you need some extras like a GPS, a music playlist, and snacks. Think of these extras as Python packages.

>GPS: NumPy helps you navigate data efficiently.

>Music Playlist: Matplotlib adds a visual element to your trip.

>Snacks: scikit-learn fuels your machine learning tasks.

>To get these extras, you first need to install them, just like you'd stop by a store. Use pip to install packages:


In [None]:

python3 get-pip.py

pip3 install numpy


SyntaxError: invalid syntax (1833252686.py, line 1)

Once installed, bring them along by importing them:


In [None]:

import numpy as np


This way, your road trip (or Python project) is smooth and enjoyable!

## Import package

Let's say you wanted to calculate the circumference and area of a circle. Here's what those formulas look like:

Rather than typing the number for pi, you can use the math package that contains the number

For reference, ** is the symbol for exponentiation. For example 3**4 is 3 to the power of 4 and will give 81.



Instructions

>Import the math package.

>Use math.pi to calculate the circumference of the circle and store it in C.

>Use math.pi to calculate the area of the circle and store it in A.

In [None]:
# Import the math package
import math

# Calculate C
C = 2 * 0.43 * math.pi

# Calculate A
A = math.pi * 0.43 ** 2

print("Circumference: " + str(C))
print("Area: " + str(A))

### Selective import
General imports, like import math, make all functionality from the math package available to you. However, if you decide to only use a specific part of a package, you can always make your import more selective:


In [None]:

from math import pi
#Try the same thing again, but this time only use pi.



Instructions

>Perform a selective import from the math package where you only import the pi function.

>Use math.pi to calculate the circumference of the circle and store it in C.

>Use math.pi to calculate the area of the circle and store it in A.

In [None]:
# Import pi function of math package
from math import pi

# Calculate C
C = 2 * 0.43 * pi

# Calculate A
A = pi * 0.43 ** 2

print("Circumference: " + str(C))
print("Area: " + str(A))

### Different ways of importing
There are several ways to import packages and modules into Python. Depending on the import call, you'll have to use different Python code.

Suppose you want to use the function inv(), which is in the linalg subpackage of the scipy package. You want to be able to use this function as follows:


In [None]:

my_inv([[1,2], [3,4]])



Which import statement will you need in order to run the above code without an error?



Instructions

Possible answers

>import scipy

>import scipy.linalg

>from scipy.linalg import my_inv

>from scipy.linalg import inv as my_inv <<-----correct answer

# NumPy

## Your First NumPy Array
You're now going to dive into the world of baseball. Along the way, you'll get comfortable with the basics of numpy, a powerful package to do data science.

A list baseball has already been defined in the Python script, representing the height of some baseball players in centimeters. Can you add some code to create a numpy array from it?



Instructions

>Import the numpy package as np, so that you can refer to numpy with np.

>Use np.array() to create a numpy array from baseball. Name this array np_baseball.

>Print out the type of np_baseball to check that you got it right.

In [None]:
# Import the numpy package as np
import numpy as np

baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Create a numpy array from baseball: np_baseball
np_baseball=np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))

<class 'numpy.ndarray'>


#### Baseball players' height

You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: height_in. The height is expressed in inches. Can you make a numpy array out of it and convert the units to meters?

height_in is already available and the numpy package is loaded, so you can start straight away (Source: stat.ucla.edu).



Instructions

>Create a numpy array from height_in. Name this new array np_height_in.

>Print np_height_in.

>Multiply np_height_in with 0.0254 to convert all height measurements from inches to meters. 

>Store the new values in a new array, np_height_m.

>Print out np_height_m and check if the output makes sense.

In [None]:
# Import numpy
import pandas as pd
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
height_in = mlb['Height'].tolist()
import numpy as np

# Create a numpy array from height_in: np_height_in
np_height_in = np.array(height_in)

# Print out np_height_in
print(np_height_in)

# Convert np_height_in to m: np_height_m
np_height_m=np_height_in*0.0254

# Print np_height_m
print(np_height_m)

[74 74 72 ... 75 75 73]
[1.8796 1.8796 1.8288 ... 1.905  1.905  1.8542]


## NumPy Side Effects

numpy is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed.

First of all, numpy arrays cannot contain elements with different types. Second, the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and numpy arrays.

Some lines of code have been provided for you. Try these out and select the one that would match this:


In [None]:
import numpy as np


np.array([True, 1, 2]) + np.array([3, 4, False])


array([4, 5, 2])


The numpy package is already imported as np.

Possible answers


>np.array([True, 1, 2, 3, 4, False])

>np.array([4, 3, 0]) + np.array([0, 2, 2]) <<-- has same value as array([4, 5, 2])  a.k.a  np.array([True, 1, 2]) + np.array([3, 4, False])

>np.array([1, 1, 2]) + np.array([3, 4, -1])

>np.array([0, 1, 2, 3, 4, 5])

## Subsetting NumPy Arrays
Subsetting (using the square bracket notation on lists or arrays) works exactly the same with both lists and arrays.

This exercise already has two lists, height_in and weight_lb, loaded in the background for you. These contain the height and weight of the MLB players as regular lists. It also has two numpy array lists, np_weight_lb and np_height_in prepared for you.



Instructions

>Subset np_weight_lb by printing out the element at index 50.

>Print out a sub-array of np_height_in that contains the elements at index 100 up to and including index 110.

In [None]:
import pandas as pd
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
weight_lb = mlb['Weight'].tolist()
height_in = mlb['Height'].tolist()
import numpy as np

np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)

# Print out the weight at index 50
print(np_weight_lb[50])

# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])

200
[73 74 72 73 69 72 73 75 75 73 72]


## NumPy: remarks

### Different types: different behavior!

In [None]:
import numpy as np

python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])
python_list_plusnonarray=python_list + python_list
python_list_plusarray=numpy_array + numpy_array

print(python_list_plusnonarray)
print(python_list_plusarray)

[1, 2, 3, 1, 2, 3]
[2 4 6]


In [None]:
import numpy as np

height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
weight / height ** 2

#failed typeerror EXAMPLE


TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [None]:

import numpy as np
np_height = np.array(height)
np_weight = np.array(weight)
np_weight / np_height ** 2


array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

### NumPy arrays: contain only one type


In [None]:
import numpy as np

np.array([1.0, "is", True])


#NumPy arrays: contain only one type


array(['1.0', 'is', 'True'], dtype='<U32')

### NumPy Subsetting

In [20]:
import pandas as pd
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
weight_lb = mlb['Weight'].tolist()
height_in = mlb['Height'].tolist()
age_yr = mlb['Age'].tolist()
import numpy as np

np_weight = np.array(weight_lb)
np_height = np.array(height_in)
np_age = np.array(age_yr)

bmi=703*np_weight / np_height**2
print(bmi)

print(bmi[1])
print(bmi > 23)
print(bmi[bmi > 23])

[23.10810811 27.60135135 28.47800926 ... 25.62044444 23.74577778
 25.72433853]
27.60135135135135
[ True  True  True ...  True  True  True]
[23.10810811 27.60135135 28.47800926 28.47800926 24.80090073 25.98781769
 30.8605335  27.89129141 28.11513158 25.10216227 24.80090073 23.74554325
 23.75       26.57844991 26.54183673 24.93282042 23.12088889 25.30522682
 25.9077071  24.95065789 29.52702703 23.73355263 24.40972222 26.77563975
 28.12       24.06965762 25.03378378 24.00938262 24.13513514 23.10979619
 23.74554325 24.99555556 26.38393695 30.61955556 29.99466667 27.60135135
 27.31674018 24.40487998 25.5472973  26.38393695 28.36273222 24.34210526
 26.31756757 26.44594595 26.68530612 25.49459877 26.08534323 26.95945946
 27.97653061 26.38393695 24.99555556 25.80263158 27.26315789 24.265286
 26.31756757 28.24324324 23.73355263 23.71394839 27.85855815 26.34516765
 33.744      23.71394839 26.24533333 23.125      28.24324324 24.40972222
 27.79996142 26.24533333 29.02233064 27.83505348 26.38393695

In [22]:
np_weight
np_height
np_age
bmi

# Combine np_weight, np_height, and bmi into one 2D array
combined = np.column_stack((np_weight, np_height, np_age, bmi))

print(combined)

[[180.          74.          22.99        23.10810811]
 [215.          74.          34.69        27.60135135]
 [210.          72.          30.78        28.47800926]
 ...
 [205.          75.          25.19        25.62044444]
 [190.          75.          31.01        23.74577778]
 [195.          73.          27.92        25.72433853]]


## Your First 2D NumPy Array
Before working on the actual MLB data, let's try to create a 2D numpy array from a small list of lists.

In this exercise, baseball is a list of lists. The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 baseball players, in this order. baseball is already coded for you in the script.



Instructions

>Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.

>Print out the type of np_baseball.

>Print out the shape attribute of np_baseball. Use np_baseball.shape.

In [2]:
import numpy as np

baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Create a 2D numpy array from baseball: np_baseball
np_baseball= np.array(baseball)

# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball
print(np_baseball.shape)


<class 'numpy.ndarray'>
(4, 2)


#### Baseball data in 2D form
You realize that it makes more sense to restructure all this information in a 2D numpy array.

You have a Python list of lists. In this list of lists, each sublist represents the height and weight of a single baseball player. The name of this list is baseball and it has been loaded for you already (although you can't see it).

Store the data as a 2D array to unlock numpy's extra functionality.



Instructions

>Use np.array() to create a 2D numpy array from baseball. Name it np_baseball.

>Print out the shape attribute of np_baseball.

In [3]:
import numpy as np

# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out the shape of np_baseball
print(np_baseball.shape)
print(np_baseball)

(4, 2)
[[180.   78.4]
 [215.  102.7]
 [210.   98.5]
 [188.   75.2]]


### Subsetting 2D NumPy Arrays
If your 2D numpy array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy. Have a look at the code below where the elements "a" and "c" are extracted from a list of lists.


In [7]:

#numpy
import numpy as np
np_x = np.array(x)
np_x[:, 0]


NameError: name 'x' is not defined

>The indexes before the comma refer to the rows, while those after the comma refer to the columns. The : is for slicing; in this example, it tells Python to include all rows.



Instructions

>Print out the 50th row of np_baseball.

>Make a new variable, np_weight_lb, containing the entire second column of np_baseball.

>Select the height (first column) of the 124th baseball player in np_baseball and print it out.

In [17]:
import numpy as np
baseball = np.column_stack(( np_height,np_weight))
np_baseball = np.array(baseball)


# Print out the 50th row of np_baseball
print(np_baseball[49,:])

# Select the entire second column of np_baseball: np_weight_lb
np_weight_lb = np_baseball[:,1]

# Print out height of 124th player
print(np_baseball[123,0])

[ 70 195]
75


### 2D Arithmetic
2D numpy arrays can perform calculations element by element, like numpy arrays.

np_baseball is coded for you; it's again a 2D numpy array with 3 columns representing height (in inches), weight (in pounds) and age (in years). baseball is available as a regular list of lists and updated is available as 2D numpy array.



Instructions

>You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D numpy array, updated. Add np_baseball and updated and print out the result.

>You want to convert the units of height and weight to metric (meters and kilograms, respectively). 

>As a first step, create a numpy array with three values: 0.0254, 0.453592 and 1. Name this array conversion.

>Multiply np_baseball with conversion and print out the result.

In [33]:
import pandas as pd
import numpy as np
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
weight_lb = mlb['Weight'].tolist()
height_in = mlb['Height'].tolist()
age_yr = mlb['Age'].tolist()

np_weight = np.array(weight_lb)
np_height = np.array(height_in)
np_age = np.array(age_yr)

baseball = np.column_stack((np_height,np_weight,np_age))

np_baseball = np.array(baseball)

# Print out addition of np_baseball and updated
updatedx=[1.8796 , 97.52228 ,34.69  ]
print(np_baseball+updatedx)
# Create numpy array: conversion
conversion=np.array([0.0254,0.453592, 1])
print(conversion)

# Print out product of np_baseball and conversion
product=np_baseball*conversion
print(product)

[[ 75.8796  277.52228  57.68   ]
 [ 75.8796  312.52228  69.38   ]
 [ 73.8796  307.52228  65.47   ]
 ...
 [ 76.8796  302.52228  59.88   ]
 [ 76.8796  287.52228  65.7    ]
 [ 74.8796  292.52228  62.61   ]]
[0.0254   0.453592 1.      ]
[[ 1.8796  81.64656 22.99   ]
 [ 1.8796  97.52228 34.69   ]
 [ 1.8288  95.25432 30.78   ]
 ...
 [ 1.905   92.98636 25.19   ]
 [ 1.905   86.18248 31.01   ]
 [ 1.8542  88.45044 27.92   ]]


## Numpy: Basic Statistic

Real-Life Example

Imagine you're organizing a community health fair and you collect data from 5000 participants about their height and weight. This data is stored in a 2D NumPy array called np_city. To make sense of this data:

Average Height: Use np.mean() to find the average height. This helps you understand the typical height in your community.


In [34]:

average_height = np.mean(np_city[:, 0])


NameError: name 'np_city' is not defined


>Median Height: Use np.median() to find the median height, giving you the middle value when sorted.

>Correlation: Check if height and weight are related using np.corrcoef().

These steps ensure your data is accurate and meaningful.

In this lesson, you'll dive into the basics of analyzing data using NumPy. You’ll learn how to generate summarizing statistics from large datasets, like a city-wide survey of 5000 adults' heights and weights, stored in a 2D NumPy array.

Key points include:



>Extracting data columns for analysis

>Calculating mean and median with np.mean() and np.median()

>Using np.corrcoef() for correlation and np.std() for standard deviation

>Understanding NumPy's speed advantage due to consistent data types

Here's a quick example to calculate the average height:


In [None]:

import numpy as np
average_height = np.mean(np_city[:, 0])

#Now, proceed to the exercises to practice these concepts!

### Average versus median
You now know how to use numpy functions to get a better feeling for your data.

The baseball data is available as a 2D numpy array with 3 columns (height, weight, age) and 1015 rows. The name of this numpy array is np_baseball. After restructuring the data, however, you notice that some height values are abnormally high. Follow the instructions and discover which summary statistic is best suited if you're dealing with so-called outliers. np_baseball is available.



Instructions

>Create numpy array np_height_in that is equal to first column of np_baseball.

>Print out the mean of np_height_in.

>Print out the median of np_height_in.

In [2]:
import pandas as pd
import numpy as np
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
weight_lb = mlb['Weight'].tolist()
height_in = mlb['Height'].tolist()
age_yr = mlb['Age'].tolist()

np_weight = np.array(weight_lb)
np_height = np.array(height_in)
np_age = np.array(age_yr)

baseball = np.column_stack((np_height,np_weight,np_age))

np_baseball = np.array(baseball)


# Create np_height_in from np_baseball
np_height_in=np_baseball[:,0]
print(np_height_in)

# Print out the mean of np_height_in
print("Mean =",(np.mean(np_height_in)))

# Print out the median of np_height_in
print("Median =",(np.median(np_height_in)))


[74. 74. 72. ... 75. 75. 73.]
Mean = 73.6896551724138
Median = 74.0


### Square Root


np.sqrt() digunakan untuk menghitung akar kuadrat dari sebuah angka.
Contoh:

In [15]:
import numpy as np
kuadarat=np.sqrt(9)
print(kuadarat)

3.0


➤ Hubungannya dengan SD:

Standard deviation adalah akar kuadrat dari variance, maka:


In [None]:

np.std(data) = np.sqrt(np.var(data))

### Kovarian 

![image.png](attachment:image.png)

(xi - x̄) = seberapa jauh titik X dari rata-rata

(yi - ȳ) = seberapa jauh titik Y dari rata-rata

Perkalian keduanya menunjukkan arah hubungan:

positif × positif → positif

negatif × negatif → positif

positif × negatif → negatif



##### Analogi Sederhana perbedaan Kovarian dengan Varian
>🔧 Kovarians:
Bayangkan dua roller coaster (X dan Y). Kamu mau tahu: apakah keduanya naik dan turun bersama?

Kalau mereka selalu naik-turun bareng, mereka punya kovarians tinggi dan positif.

Kalau satu naik saat satunya turun → kovarians negatif.

Kalau gerakannya acak dan tak sinkron → kovarians kecil (mendekati nol).

>🔧 Varians:
Kamu hanya memperhatikan satu roller coaster dan ingin tahu: seberapa liar dia bergerak dari tengah (rata-rata)?

### Variance 


Varians adalah spesial case dari kovarians saat dua variabelnya sama:

![image.png](attachment:image.png)

Jadi varians = seberapa data menyebar dari rata-rata.

Varians hanya melihat satu variabel.

Kovarians membandingkan dua variabel.



Variance adalah ukuran seberapa tersebar data dari nilai rata-ratanya.

➤ Analogi:
Kamu punya sekumpulan angka. Kalau semua angkanya mendekati nilai rata-rata, variance-nya kecil. Kalau angkanya menyebar jauh-jauh, variance-nya besar.

Misalnya:

[5, 5, 5, 5, 5] → Variance = 0

[1, 5, 9] → Variance tinggi



In [8]:

import numpy as np
data = [1, 2, 3, 4, 5]
np.var(data)  

np.float64(2.0)


### Standard Deviasi
![image.png](attachment:image.png)

Standard deviation (SD) mengukur seberapa tersebar data terhadap nilai rata-rata (mean).


Bayangkan kamu dan teman-temanmu lomba lari 100 meter. Kalau semua orang sampai garis finish dalam waktu hampir sama (misalnya 12 detik, 12.1, 11.9), maka:

>➤ Simpangan bakunya kecil → Semua peserta punya kecepatan mirip-mirip.

Tapi kalau ada yang finish dalam 10 detik, ada yang 14 detik, ada yang 17 detik, maka:

>➤ Simpangan bakunya besar → Hasilnya tersebar jauh dari rata-rata.


➤ Hubungannya dengan SD:

Standard deviation adalah akar kuadrat dari variance, maka:


In [None]:

np.std(data) = np.sqrt(np.var(data))

##### Kapan digunakan?
Kapan pun kamu ingin tahu:

>Apakah data konsisten atau bervariasi besar

>Apakah ada outlier (nilai yang menyimpang jauh)

>Apakah sebuah proses stabil (misalnya suhu mesin, performa atlet, traffic website)

In [7]:
import numpy as np

data = [10, 12, 13, 15, 10]
std = np.std(data)
print(std)

1.8973665961010275


>Artinya, data menyebar ±1.89 dari rata-rata.

![image.png](attachment:image.png)

### Correlation Coefisien


![image.png](attachment:image.png)
atau
![image-2.png](attachment:image-2.png)

adalah fungsi dari library NumPy di Python yang digunakan untuk menghitung koefisien korelasi Pearson antar dua (atau lebih) array. Korelasi Pearson mengukur hubungan linear antara dua variabel, dan nilainya berkisar antara:

>+1: Korelasi positif sempurna

>0: Tidak ada korelasi

>–1: Korelasi negatif sempurna

##### Analogi:

Misalkan kamu punya dua anak kucing:
🐱 Kucing A suka berjemur di jendela
🐱 Kucing B suka tidur di sofa

Setiap hari kamu mencatat:

Apakah Kucing A berjemur (1 = ya, 0 = tidak)

Apakah Kucing B tidur di sofa (1 = ya, 0 = tidak)

Setelah sebulan kamu penasaran:

>“Apakah ketika si Kucing A berjemur, si B juga tidur di sofa? Apakah mereka suka melakukan aktivitas ini bersamaan?”

Di sinilah np.corrcoef masuk:
Ia membantu menjawab:

Seberapa kuat hubungan antar dua perilaku?

Hasilnya:

1.0 = Mereka selalu melakukan aktivitas bersamaan

0.0 = Tidak ada hubungan; kadang iya, kadang tidak

–1.0 = Kalau satu melakukan, yang lain tidak


🎯 Interpretasi Nilai Korelasi


In [None]:

Nilai Korelasi	Makna Hubungan	Analogi Sederhana

+1.0	        Sangat kuat positif (perfect)	Kalau A naik, B selalu naik bareng
+0.7 - +0.9	    Kuat positif	Kalau A naik, B hampir selalu ikut naik
+0.4 - +0.6	    Cukup positif (moderate)	A naik, B sering ikut naik, tapi tidak selalu
+0.1 - +0.3	    Lemah positif	Kadang A naik, B juga naik
0.0	            Tidak ada hubungan linear	A naik atau turun, B tidak bisa ditebak
-0.1 - -0.3	    Lemah negatif	Kadang A naik, B turun
-0.4 - -0.6	    Cukup negatif	A naik, B sering turun
-0.7 - -0.9	    Kuat negatif	A naik, B hampir selalu turun
-1.0 	        Sangat kuat negatif (perfect)	A naik, B selalu turun




📈 Penjelasan arah:

>Positif (+): Saat satu naik, yang lain juga naik

>Negatif (-): Saat satu naik, yang lain turun

>Mendekati 0: Tidak ada pola yang bisa ditebak



In [5]:
import numpy as np

kucing_a = [1, 1, 0, 1, 0]
kucing_b = [1, 1, 0, 0, 0]

np.corrcoef(kucing_a, kucing_b)
print(np.corrcoef(kucing_a, kucing_b)) 
print(np.corrcoef(kucing_a, kucing_b)[0, 1])  # Access the correlation coefficient directly

[[1.         0.66666667]
 [0.66666667 1.        ]]
0.6666666666666666


In [None]:
[[1.     0.87]
 [0.87  1.   ]]


Ini disebut matriks korelasi 2x2.
Karena kamu memberikan dua set data (misalnya, Kucing A dan Kucing B), maka hasilnya 2 baris dan 2 kolom.

In [None]:
    Kucing A	Kucing B
A	1.0	        0.87
B	0.87	    1.0


>Baris 1, Kolom 1 = Korelasi Kucing A dengan dirinya sendiri

>Baris 1, Kolom 2 = Korelasi Kucing A dengan Kucing B

>Baris 2, Kolom 1 = Korelasi Kucing B dengan Kucing A

>Baris 2, Kolom 2 = Korelasi Kucing B dengan dirinya sendiri

Kenapa cuma 2 angka penting?
Karena:

>Korelasi diri sendiri dengan dirinya sendiri selalu 1.0

>Yang kita butuhkan cuma hubungan antar keduanya, yaitu 0.87

>Dan karena korelasi bersifat simetris (korelasi A ke B = korelasi B ke A), maka:

>np.corrcoef(x, y) = np.corrcoef(y, x)


>💡 Kenapa ini penting di dunia nyata?
Misal kamu kerja sebagai data analyst dan melihat:

Naiknya jumlah iklan dan penjualan produk

Apakah naiknya iklan berkorelasi dengan naiknya penjualan?

Dengan np.corrcoef, kamu bisa:

✅ Buktikan dengan angka

✅ Ambil keputusan berdasar data (bukan perasaan)



Kesimpulan:

np.corrcoef itu seperti alat pengukur kekompakan antara dua hal.
Dipakai untuk menjawab pertanyaan:

>"Apakah dua hal ini sering terjadi bersama?"

📊 Contoh 1: Korelasi antara dua array

In [5]:
import numpy as np

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

correlation_matrix = np.corrcoef(x, y)
print(correlation_matrix)

[[1. 1.]
 [1. 1.]]


In [6]:
import numpy as np

x = [1, 2, 8, 4, 3]
y = [2, 6, 4, 8, 10]

correlation_matrix = np.corrcoef(x, y)
print(correlation_matrix)

[[ 1.00000000e+00 -2.59883553e-17]
 [-2.59883553e-17  1.00000000e+00]]


📊 Contoh 2: Korelasi antar beberapa variabel (array 2D)

In [None]:
import numpy as np

data = np.array([
    [1, 2, 6],  # Variabel A
    [8, 4, 6],  # Variabel B
    [7, 3, 9]   # Variabel C
])

np.corrcoef(data)

array([[ 1.        , -0.18898224,  0.61858957],
       [-0.18898224,  1.        ,  0.65465367],
       [ 0.61858957,  0.65465367,  1.        ]])

In [4]:
import numpy as np

data = np.array([
    [1, 2, 3],  # Variabel A
    [4, 5, 6],  # Variabel B
    [7, 8, 9]   # Variabel C
])

np.corrcoef(data)

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

#### Explore the baseball data
Because the mean and median are so far apart, you decide to complain to the MLB. They find the error and send the corrected data over to you. It's again available as a 2D NumPy array np_baseball, with three columns.

The Python script in the editor already includes code to print out informative messages with the different summary statistics and numpy is already loaded as np. Can you finish the job? np_baseball is available.



Instructions

>The code to print out the mean height is already included. Complete the code for the median height.

>Use np.std() on the first column of np_baseball to calculate stddev.

>Do big players tend to be heavier? Use np.corrcoef() to store the correlation between the first and second column of np_baseball in corr.

In [16]:

import pandas as pd
import numpy as np
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
weight_lb = mlb['Weight'].tolist()
height_in = mlb['Height'].tolist()
age_yr = mlb['Age'].tolist()

np_weight = np.array(weight_lb)
np_height = np.array(height_in)
np_age = np.array(age_yr)

baseball = np.column_stack((np_height,np_weight,np_age))

np_baseball = np.array(baseball)

avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))

# Print median height
med = np.median(np_baseball[:,0])
print("Median: " + str(med))

# Print out the standard deviation on height
stddev = np.std(np_baseball[:,0])
print("Standard Deviation: " + str(stddev))

# Print out correlation between first and second column
corr = np.corrcoef(np_baseball[:,0],np_baseball[:,1])
print("Correlation: " + str(corr))

Average: 73.6896551724138
Median: 74.0
Standard Deviation: 2.312791881046546
Correlation: [[1.         0.53153932]
 [0.53153932 1.        ]]
