<h1 id="7464">
Python Data Types
</h1>
<p id="1c30">
In Python, we have many data types. The most common ones are float (floating point), int (integer), str (string), bool (Boolean), list, and dict (dictionary).
</p>
<ul>
<li id="2c2e">
float - used for real numbers.
</li>
<li id="a366">
int - used for integers.
</li>
<li id="1a7f">
str - used for texts. We can define strings using single quotes
<code>
'value'
</code>
, double quotes
<code>
"value"
</code>
, or triple quotes
<code>
"""value"""
</code>
. The triple quoted strings can be on multiple lines, the new lines will be included in the value of the variable. They’re also used for writing function documentation.
</li>
<li id="9187">
bool - used for truthy values. Useful to perform a filtering operation on a data.
</li>
<li id="56a0">
list - used to store a collection of values.
</li>
<li id="0107">
dict - used to store a key-values pairs.
</li>
</ul>
<p id="57ee">
We can use the
<code>
type(variable_name)
</code>
function to check the type of a specific variable. Operators in Python behave differently depending on the variable’s type and there are different built-in methods for each one.
</p>
<p id="d207">
Here we can look at some examples with creating a floating points, intergers, strings and booleans in Python.
</p>

In [1]:
year_of_birth = 1994
height_cm = 170.50
subject = "Data Science"
is_success = True

print(type(year_of_birth), type(height_cm), type(subject), type(is_success))

<class 'int'> <class 'float'> <class 'str'> <class 'bool'>


<h1 id="a308">
Python Lists
</h1>
<p id="2182">
Python list is a basic sequence type. We can use this type to store a collection of values. One list can contain values of
<strong>
any type
</strong>
. It is possible that one list contains another nested lists for its values. It’s not commonly used, but you can have a list with a mix of Python types. You can create a new one using square brackets like this:
</p>
<p id="da02">
<code>
fruits = ["pineapple", "apple", "lemon", "strawberry", "orange", "kiwi"]
</code>
</p>
<h2 id="e372">
Subsetting Lists
</h2>
<p id="abda">
You can use indexes to get element or elements from the list. In Python, the indexes start from
<code>
0.
</code>
Therefore, the first element in the list will have an index
<code>
0
</code>
. We can also use negative indexes to access elements. The last element in the list will have an index
<code>
-1
</code>
, the one before the last one will have an index
<code>
-2
</code>
and so on. We have also something called
<strong>
list slicing
</strong>
in Python which can be used to get multiple elements from a list. We can use it like this:
<code>
sliceable[start_index:end_index:step].
</code>
</p>
<ul>
<li id="b202">
The
<code>
start_index
</code>
is the beginning index of the slice, the element at this index will be included to the result, the default value is
<code>
0
</code>
.
</li>
<li id="e1d3">
The
<code>
end_index
</code>
is the end index of the slice, the element at this index will
<strong>
not be included
</strong>
to the result, the default value will be the
<code>
length of the list
</code>
. Also, the default value can be
<code>
- length of the list -1
</code>
if the step is negative. If you skip this, you will get all the elements from the start index to the end.
</li>
<li id="2875">
The
<code>
step
</code>
is the amount by which the index increases,
<br/>
the default value is
<code>
1
</code>
. If we set a negative value for the step, we’ll move backward.
</li>
</ul>

In [4]:
fruits = ["pineapple", "apple", "lemon", "strawberry", "orange", "kiwi"]
fruits[1]  # apple
fruits[0]  # "pineapple"
fruits[-1] # "kiwi"
fruits[5]  # "kiwi"
fruits[-3] # "strawberry"

# List slicing
fruits[::]    # ["pineapple", "apple", "lemon", "strawberry", "orange", "kiwi"]
fruits[0:2]   # ["pineapple", "apple"]
fruits[-2:-1] # ["orange"]
fruits[3:]    # ["strawberry", "orange", "kiwi"]
fruits[:4]    # ["pineapple", "apple", "lemon", "strawberry"]
fruits[:]     # ["pineapple", "apple", "lemon", "strawberry", "orange", "kiwi"]
fruits[::-1]  # ["kiwi", "orange", "strawberry", "lemon", "apple", "pineapple"]
fruits[::-2]  # ["kiwi", "strawberry", "apple"]
fruits[::2]   # ["pineapple", "lemon", "orange"]

['pineapple', 'lemon', 'orange']

<h2 id="3916">
List Manipulation
</h2>
<ul>
<li id="b899">
We can add element or elements to a list using
<code>
append
</code>
method or by using the
<code>
plus operator
</code>
. If you’re using the plus operator on two lists, Python will give a new list of the contents of the two lists.
</li>
<li id="24ae">
We can change element or elements to list using the same square brackets that we already used for indexing and list slicing.
</li>
<li id="40ba">
We can delete an element from a list with the
<code>
remove(value)
</code>
method. This method will delete the first element of the list with the passed value.
</li>
</ul>

In [None]:
# Add values to a list
fruits.append("peach")
fruits # ["pineapple", "apple", "lemon", "strawberry", "orange", "kiwi", "peach"]
fruits = fruits + ["fig", "melon"]
fruits # ["pineapple", "apple", "lemon", "strawberry", "orange", "kiwi", "peach", "fig", "melon"]

# Change values from a list
fruits[0:2] = ["grape", "mango"]
fruits # ["grape", "mango", "lemon", "strawberry", "orange", "kiwi", "peach", "fig", "melon"]

# Delete values from a list
fruits.remove("mango")
fruits # ["grape", "lemon", "strawberry", "orange", "kiwi", "peach", "fig", "melon"]

<p id="6b4e">
It’s important to understand how lists work behind the scenes in Python. When you create a new list
<code>
my_list
</code>
, you’re storing the list in your computer memory, and the address of that list is stored in the
<code>
my_list
</code>
variable. The variable
<code>
my_list
</code>
doesn’t contain the elements of the list. It contains a reference to the list. If we copy a list with the equal sign only like this
<code>
my_list_copy = my_list
</code>
, you’ll have the reference copied in the
<code>
my_list_copy
</code>
variable instead of the list values. So, if you want to copy the actual values, you can use the
<code>
list(my_list)
</code>
function or slicing
<code>
[:]
</code>
.
</p>

In [6]:
numbers = [10, 42, 28, 420]
numbers_copy = numbers
numbers_copy[2] = 100
numbers      # [10, 42, 100, 420]
numbers_copy # [10, 42, 100, 420]

ratings = [4.5, 5.0, 3.5, 4.75, 4.00]
ratings_copy = ratings[:]
ratings_copy[0] = 2.0
ratings      # [4.5, 5.0, 3.5, 4.75, 4.0]
ratings_copy # [2.0, 5.0, 3.5, 4.75, 4.0]

characters = ["A", "B", "C"]
characters_copy = list(characters)
characters_copy[-1] = "D"
characters      # ["A", "B", "C"]
characters_copy # ["A", "B", "D"]

['A', 'B', 'D']

<h1 id="5c62">
Python Dictionaries
</h1>
<p id="ea53">
The dictionaries are used to store
<strong>
key-value pairs
</strong>
. They are helpful when you want your values to be indexed by
<strong>
unique keys
</strong>
. In Python, you can create a dictionary using
<strong>
curly braces
</strong>
. Also, a key and a value are separated by a
<strong>
colon
</strong>
. If we want to get the value for a given key, we can do it like that:
<code>
our_dict[key]
</code>
.
</p>
<h2 id="d159">
Dictionaries vs Lists
</h2>
<p id="a0f6">
Let’s see an example and compare the lists versus dictionaries. Imagine that we have some movies and you want to store the ratings for them. Also, we want to access the rating for a movie very fast by having the movie name. We can do this by using two lists or one dictionary. In examples the
<code>
movies.index(“Ex Machina”)
</code>
code returns the index for the “Ex Machina” movie.
</p>

In [7]:
movies = ["Ex Machina", "Mad Max: Fury Road", "1408"]
ratings = [7.7, 8.1, 6.8]

movie_choice_index = movies.index("Ex Machina")
print(ratings[movie_choice_index]) # 7.7

7.7


In [8]:
ratings = {
    "Ex Machina": 7.7,
    "Mad Max: Fury Road": 8.1,
    "1408" : 6.8
}

print(ratings["Ex Machina"]) # 7.7

7.7


<p id="b36f">
In this case, the usage of a dictionary is a more intuitive and convenient way to represent the ratings.
</p>
<h2 id="12e9">
Dictionaries Operations
</h2>
<p id="4ad9">
We can
<strong>
add
</strong>
,
<strong>
update
</strong>
, and
<strong>
delete
</strong>
data from our dictionaries. When we want to add or update the data we can simply use this code
<code>
our_dict[key] = value
</code>
. When we want to delete a key-value pair we do this like that
<code>
del(our_dict[key])
</code>
.
</p>

In [9]:
ratings["Deadpool"] = 8.0
print(ratings) # {'Ex Machina': 7.7, 'Mad Max: Fury Road': 8.1, '1408': 6.8, 'Deadpool': 8.0}

ratings["Ex Machina"] = 7.8
print(ratings) # {'Ex Machina': 7.8, 'Mad Max: Fury Road': 8.1, '1408': 6.8, 'Deadpool': 8.0}

del(ratings["1408"])
print(ratings) # {'Ex Machina': 7.8, 'Mad Max: Fury Road': 8.1, 'Deadpool': 8.0}

{'Ex Machina': 7.7, 'Mad Max: Fury Road': 8.1, '1408': 6.8, 'Deadpool': 8.0}
{'Ex Machina': 7.8, 'Mad Max: Fury Road': 8.1, '1408': 6.8, 'Deadpool': 8.0}
{'Ex Machina': 7.8, 'Mad Max: Fury Road': 8.1, 'Deadpool': 8.0}


<p id="0054">
We can also check if a given key is in our dictionary like that:
<code>
key in our_dict
</code>
.
</p>


In [10]:
print("Ex Machina" in ratings) # True

True


<h1 id="70ce">
Functions
</h1>
<p id="df8c">
A function is a piece of reusable code solving a specific task. We can write our functions using the
<code>
def
</code>
keyword like that:
</p>

In [11]:
def is_prime(n):
    if n <= 1:
        return False
    elif n <= 3:
        return True
    elif n % 2 == 0 or n % 3 == 0:
        return False
    current_number = 5
    while current_number * current_number <= n:
        if n % current_number == 0 or n % (current_number + 2) == 0:
            return False
        current_number = current_number + 6
    return True


<p id="b707">
However, there are many built-in function in Python like
<code>
max(iterable [, key]),
</code>
,
<code>
min(iterable [, key])
</code>
,
<code>
type(object)
</code>
,
<code>
round(number [, ndigits])
</code>
, etc. So, in many cases when we need a function that solves a given task, we can research for a built-in function that solves this task or a Python package for that. We don’t have to “
<a href="https://en.wikipedia.org/wiki/Reinventing_the_wheel">
reinventing the wheel
</a>
”.
</p>
<p id="8d75">
Most of the functions take some input and return some output. These functions have arguments, and Python matches the passed inputs in a function call to the arguments. If square brackets surround an argument, it’s optional.
</p>
<p id="8160">
We can use the function
<code>
help([object])
</code>
or
<code>
?function_name
</code>
to see the documentation of any function. If we’re using Jupyter Notebook, the
<code>
help
</code>
function will show us the documentation in the current cell, while the second option will show us the documentation in the pager.
</p>
<h1 id="ac88">
Methods
</h1>
<p id="4533">
We’ve seen that we have strings, floats, integers, booleans, etc. in Python. Each one of these data structures is an object. A method is a function that is available for a given object depending on the object’s type. So, each object has a specific type and a set of methods depending on this type.
</p>

In [13]:
# String methods
text = "Data Science" 
text.upper() # "DATA SCIENCE"
text.lower() # "data science"
text.capitalize() # "Data science"

# Lists methods
numbers = [1, 4, 0, 2, 9, 9, 10]
numbers.reverse()
print(numbers) # [10, 9, 9, 2, 0, 4, 1]
numbers.sort()
print(numbers) # [0, 1, 2, 4, 9, 9, 10]

# Dictionaris methods
ratings = {
    "Ex Machina": 7.7,
    "Mad Max: Fury Road": 8.1,
    "1408" : 6.8
}

print(ratings.keys()) # dict_keys(['Ex Machina', 'Mad Max: Fury Road', '1408'])
print(ratings.values()) # dict_values([7.7, 8.1, 6.8])
print(ratings.items()) # dict_items([('Ex Machina', 7.7), ('Mad Max: Fury Road', 8.1), ('1408', 6.8)])

[10, 9, 9, 2, 0, 4, 1]
[0, 1, 2, 4, 9, 9, 10]
dict_keys(['Ex Machina', 'Mad Max: Fury Road', '1408'])
dict_values([7.7, 8.1, 6.8])
dict_items([('Ex Machina', 7.7), ('Mad Max: Fury Road', 8.1), ('1408', 6.8)])


<p id="3b91">
Objects with different type can have methods with the same name. Depending on the object’s type, methods have different behavior.
</p>

In [14]:
numbers = [10, 30, 55, 40, 8, 30]
text = "Data Science"

numbers.index(8)  # 4
text.index("a")   # 1

numbers.count(30) # 2
text.count("i")   # 1dd

1

<p id="cef1">
Watch out! Some methods can change the objects they are called on. For example, the
<code>
append()
</code>
method called on list type.
</p>
<h1 id="2ab5">
Packages
</h1>
<p id="2275">
A module is a file containing Python definitions and statements. Modules specify functions, methods and new Python types which solved particular problems.
</p>
<p id="7a4f">
A package is a collection of modules in directories. There are many available packages for Python covering different problems. For example, “NumPy”, “matplotlib”, “seaborn”, and “scikit-learn” are very famous data science packages.
</p>
<ul>
<li id="de87">
“NumPy” is used for efficiently working with arrays
</li>
<li id="1957">
“matplotlib” and “seaborn” are popular libraries used for data visualization
</li>
<li id="2090">
“scikit-learn” is a powerful library for machine learning
</li>
</ul>
<p id="6a4b">
There are some packages available in Python by default, but there are also so many packages that we need and that we don’t have by default. If we want to use some package, we have to have it installed already or just install it using pip (package maintenance system for Python).
</p>
<p id="6128">
However, there is also something called “Anaconda”.
</p>
<blockquote>
<p id="3967">
Anaconda Distribution is a free, easy-to-install package manager, environment manager and Python distribution with a collection of 1,000+ open source packages with free community support.
</p>
</blockquote>
<p id="4feb">
So, if you don’t want to install many packages, I’ll recommend you to use the “Anaconda”. There are so many useful packages in this distribution.
</p>
<h2 id="38a8">
Import Statements
</h2>
<p id="cbf2">
Once you have installed the needed packages, you can import them into your Python files. We can import an entire package, submodules or specific functions from it. Also, we can add an alias for a package. We can see the different ways of import statements from the examples below.
</p>

In [15]:
import numpy
numbers = numpy.array([3, 4, 20, 15, 7, 19, 0])

In [17]:
import numpy as np # np is an alias for the numpy package
numbers = np.array([3, 4, 20, 15, 7, 19, 0]) # works fine
numbers = numpy.array([3, 4, 20, 15, 7, 19, 0]) # NameError: name 'numpy' is not defined

In [18]:
# import the "pyplot" submodule from the "matplotlib" package with alias "plt"
import matplotlib.pyplot as plt

In [19]:
from numpy import array
numbers = array([3, 4, 20, 15, 7, 19, 0]) # works fine
numbers = numpy.array([3, 4, 20, 15, 7, 19, 0]) # NameError: name 'numpy' is not defined
type(numbers) # numpy.ndarray

numpy.ndarray

<p id="ead5">
We can also do something like this
<code>
from numpy import *
</code>
. The asterisk symbol here means to import everything from that module. This import statement creates references in the current namespace to all public objects defined by the
<code>
numpy
</code>
module. In other words, we can just use all available functions from
<code>
numpy
</code>
only with their names without prefix. For example, now we can use the NumPy’s absolute function like that
<code>
absolute()
</code>
instead of
<code>
numpy.absolute()
</code>
.
<br/>
However, I’m not recommending you to use that because:
</p>
<ul>
<li id="3e30">
If we import all functions from some modules like that, the current namespace will be filled with so many functions and if someone looks our code, he or she can get confused from which package is a specific function.
</li>
<li id="0186">
If two modules have a function with the same name, the second import will override the function of the first.
</li>
</ul>
<h1 id="e885">
NumPy
</h1>
<p id="6765">
NumPy is a fundamental package for scientific computing with Python. It’s very fast and easy to use. This package helps us to make calculations element-wise (element by element).
</p>
<p id="e9c5">
The regular Python list doesn’t know how to do operations element-wise. Of course, we can use Python lists, but they’re slow, and we need more code to achieve a wanted result. A better decision in most cases is to use
<code>
NumPy
</code>
.
</p>
<p id="f686">
Unlike the regular Python list, the NumPy array always has one single type. If we pass an array with different types to the
<code>
np.array()
</code>
, we can choose the wanted type using the parameter
<code>
dtype
</code>
. If this parameter is not given, then the type will be determined as the minimum type required to hold the objects.
</p>

In [20]:
np.array([False, 42, "Data Science"])       # array(["False", "42", "Data Science"], dtype="<U12")
np.array([False, 42], dtype = int)          # array([ 0, 42])
np.array([False, 42, 53.99], dtype = float) # array([  0.  ,  42.  ,  53.99])

# Invalid converting
np.array([False, 42, "Data Science"], dtype = float) # could not convert string to float: 'Data Science'

ValueError: could not convert string to float: 'Data Science'

ValueError: could not convert string to float: 'Data Science'

<p id="3980">
NumPy array comes with his own attributes and methods. Remember that the operators in Python behave differently on the different data types? Well, in NumPy the operators behave element-wise.
</p>

In [24]:
np.array([37, 48, 50]) + 1 # array([38, 49, 51])
np.array([20, 30, 40]) * 2 # array([40, 60, 80])
np.array([42, 10, 60]) / 2 # array([ 21.,   5.,  30.])

np.array([1, 2, 3]) * np.array([10, 20, 30]) # array([10, 40, 90])
np.array([1, 2, 3]) - np.array([10, 20, 30]) # array([ -9, -18, -27])

array([ -9, -18, -27])

<p id="d692">
If we check the type of a NumPy array the result will be
<code>
numpy.ndarray
</code>
. Ndarray means n-dimensional array. In the examples above we used 1-dimensional arrays, but nothing can stop us to make 2, 3, 4 or more dimensional array. We can do subsetting on an array independently of that how much dimensions this array has. I’ll show you some examples with a 2-dimensional array.
</p>

In [21]:
numbers = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])

numbers[2, 1]     # 8
numbers[-1, 0]    # 10
numbers[0]        # array([1, 2, 3])
numbers[:, 0]     # array([ 1,  4,  7, 10])
numbers[0:3, 2]   # array([3, 6, 9])
numbers[1:3, 1:3] # array([[5, 6],[8, 9]])

array([[5, 6],
       [8, 9]])

<p id="c00a">
If we want to see how many dimensional is our array and how much elements have each dimension, we can use the
<code>
shape
</code>
attribute. For 2-dimensional arrays, the first element of the tuple will be the number of rows and the second the number of the columns.
</p>

In [22]:
numbers = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12],
    [13, 14, 15]
])

numbers.shape # (5, 3)

(5, 3)

<h2 id="bfb9">
Basic Statistics
</h2>
<p id="3628">
The first step of analyzing data is to get familiar with the data. NumPy has a lot of methods which help us to do that. We’ll see some basic methods to make statistics on our data.
</p>
<ul>
<li id="7c8f">
<code>
np.mean()
</code>
- returns the arithmetic mean (the sum of the elements divided by the length of the elements).
</li>
<li id="ac08">
<code>
np.median()
</code>
- returns the median (the middle value of a sorted copy of the passed array, if the length of array is even - the average of the two middle values will be computed)
</li>
<li id="3e99">
<code>
np.corrcoef()
</code>
- returns a correlation matrix. This function is useful when we want to see if there is a correlation between two variables in our dataset or with other words, between two arrays with the same length.
</li>
<li id="fefa">
<code>
np.std()
</code>
- returns a standard deviation
</li>
</ul>

In [23]:

learning_hours = [1, 2, 6, 4, 10]
grades = [3, 4, 6, 5, 6]

np.mean(learning_hours)   # 4.6
np.median(learning_hours) # 4.0
np.std(learning_hours)    # 3.2
np.corrcoef(learning_hours, grades) # [[ 1.          0.88964891][ 0.88964891  1.        ]]

array([[1.        , 0.88964891],
       [0.88964891, 1.        ]])

<p id="dbfc">
From the example above, we can see that there is a high correlation between the hours of learning and the grade.
<br/>
Also, we can see that:
</p>
<ul>
<li id="9442">
the mean for the learning hours is 4.6
</li>
<li id="d496">
the median for the learning hours is 4.0
</li>
<li id="4c52">
the standard deviation for the learning hours is 3.2
</li>
</ul>
<p id="bd51">
NumPy also has some basic functions like
<code>
np.sort()
</code>
and
<code>
np.sum()
</code>
which exists in the basic Python lists, too. An important note here is that NumPy enforces a single type in an array and this speeds up the calculations.
</p>
<h1 id="78b2">
Exercises
</h1>
<p id="35ed">
I have prepared some exercises including subsetting, element-wise operations, and basic statistics. If you want, you can try to solve them.
</p>
<ul>
<li id="ca17">
<a href="https://gist.github.com/Ventsislav-Yordanov/fb14e1c52a2b4d18422dc7075d5a09cb">
Subsetting Python list
</a>
</li>
<li id="9d69">
<a href="https://gist.github.com/Ventsislav-Yordanov/3ef4587b274643a161e4bea0952950ab">
Subsetting 2-dimensional NumPy array
</a>
</li>
<li id="19e1">
<a href="https://gist.github.com/Ventsislav-Yordanov/48a5e73ae43c668e2f2f725a503ca96c">
NumPy element-wise operations
</a>
</li>
<li id="9267">
<a href="https://gist.github.com/Ventsislav-Yordanov/8b7e47cda7b63bf97cfa5c80f05f50b3">
NumPy basic statistics
</a>
</li>
</ul>
<h1 id="1164">
Other Blog Posts by Me
</h1>
<ul>
<li id="ef19">
<a href="https://medium.com/@ventsislav94/jypyter-notebook-shortcuts-bf0101a98330">
Jupyter Notebook shortcuts
</a>
.
</li>
<li id="a267">
<a href="/python-basics-iteration-and-looping-6ca63b30835c">
Python Basics: Iteration and Looping
</a>
</li>
<li id="ad18">
<a href="/python-basics-list-comprehensions-631278f22c40">
Python Basics: List Comprehensions
</a>
</li>
<li id="a2a1">
<a href="/data-science-with-python-intro-to-data-visualization-and-matplotlib-5f799b7c6d82">
Data Science with Python: Intro to Data Visualization with Matplotlib
</a>
</li>
<li id="84db">
<a href="/data-science-with-python-intro-to-loading-and-subsetting-data-with-pandas-9f26895ddd7f">
Data Science with Python: Intro to Loading, Subsetting, and Filtering Data with pandas
</a>
</li>
<li id="25c7">
<a href="/introduction-to-natural-language-processing-for-text-df845750fb63">
Introduction to Natural Language Processing for Text
</a>
</li>
</ul>
<h1 id="79d5">
LinkedIn
</h1>
<p id="517f">
Here is
<a href="https://www.linkedin.com/in/ventsislav-yordanov-a657b086/">
my LinkedIn profile
</a>
in case you want to connect with me. I’ll be happy to be connected with you.
</p>
<h1 id="14a3">
Final Words
</h1>
<p id="3d6a">
Thank you for the read. If you like this post, please hold the clap button and share it with your friends. Also, I’ll be happy to hear your feedback. If you want to be notified when I create a new blog post, you can subscribe to
<a href="https://buttondown.email/Ventsislav">
my newsletter
</a>
.
</p>