# **Introduction to Python (Part 1)**

**Python** is a high-level, versatile, and easy-to-read programming language that is widely used in data science, web development, automation, and software engineering. It emphasizes code readability with a simple syntax, making it an excellent choice for beginners. Python supports multiple programming paradigms—including procedural, object-oriented, and functional programming—and has a vast ecosystem of libraries and tools that make it powerful for advanced applications in research and industry. For a master's level student, Python offers a strong foundation for both academic and practical problem-solving in computing.

---
## **1. Fundamentals of Programming in Python**

ٰٰ**NumPY** is a foundational Python library for numerical computing.
Provides: Fast multi-dimensional arrays (ndarray), Mathematical operations (linear algebra, statistics, etc.), Efficient data storage and vectorized operations.

**Pandas** is built on NumPy for data manipulation and analysis.
Provides: DataFrames (tabular data) and Series (1D arrays), Tools for cleaning, transforming, and analyzing structured data, Time series handling, I/O (CSV/Excel), and merging datasets.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd

---
### **1.1 Variable Assignment**

In mathematics and statistics, a variable represents a symbol that can take on different values. In Python, a variable is a name that refers to a value (or object) stored in memory. You can use a variable to store:

- Primitive data (like numbers or strings)
- Data structures (like lists or dictionaries)
- Functions
- Objects from custom classes
- Even plots or models (via external libraries)

You can later use the variable’s name to retrieve or manipulate what it refers to.

A variable name must start with a letter (lowercase or uppercase) or an underscore (_). Variable names cannot be a Python keyword (e.g., if, else, for, while, etc.).

Python uses the = symbol for assignment. In the example below, the number 35 is assigned to the variable x. This means “let x point to the object 35.” Note that when you run this command, the object or variable x is created but nothing will be displayed.

In [None]:
# Assign a value to x
x = 35

To recall or print out the value that is assigned to variable x, you just need to run variable name as a command.

In [None]:
# Display x
x

You can overwrite the stored value by assigning another value to the same variable. For example:

In [None]:
# Assign a new value to x and display
x = 18
x

Suppose we want to add 8 to x and store the sum in a new object, say y. This can be done as follows.

In [None]:
# Adds 8 to the value of x and stores the result in y
y = x + 8
y

Suppose now we have a linear equation given by y = 3x + 5. Since x = 18, then we have,

In [None]:
# Multiplies x by 3, adds 5, and stores the result in y
y = 3 * x + 5
y

---

When coding, it is always a good idea to give meaningful names to variables and better yet, ones that are self explanatory. Providing meaningful names to variables will help the reader and reduce the amount of commentaries required. For example, suppose we want to calculate the area of a circle (i.e. 𝜋𝑟2), given that its diameter is 10 cm. Here we have,

In [None]:
# Area of a circle given diameter
diam = 10  # diameter = 10 cm
radius = diam / 2  # radius is half of the diameter
area_circle = np.pi * radius ** 2  # area of the circle
area_circle

---

### **1.2 Data Types**

There are five basic data types in Python.
- Integer (int)
- Float (float)
- Boolean (bool)
- String (str)
- Complex (complex)

But we will focus on the four most commonly used in this session, i.e. **Interger, Float, Boolean and String.** The *complex* type, such as 2 + 3j, is used to handle complex numbers that include both real and imaginary parts, which we will not discuss here.


#### **1.2.1 Integer (whole numbers): int**
The *Integer* type is used for whole numbers such as 5 or -100.

In [None]:
5 # Positive whole number as constact

In [None]:
-3 # Negative whole number as constant

In [None]:
x = 5 # Assignment of whole number makes x as int by default
x

In [None]:
type(x) # Confirming the data type

#### **1.2.2 Floating-point numbers: float**
The float type represents real numbers with decimal points, like 3.14 or -0.01.

In [None]:
5.5 # Positive floating-point constant

In [None]:
-2.75 # Negative floating-point constant

In [None]:
np.pi # Display the value of 𝜋 using numpy object 'np'

We can also perform mathematical operations on the *Integer* and *Float*, just as you would do in a calculator. Only *float* type numbers are shown as an example.

In [None]:
# Basic arithmetic operations
5.5 + 2.7  # Addition

In [None]:
5.5 - 2.7  # Subtraction

In [None]:
y = 5.5 / 2; y  # Division

In [None]:
type(y) # Note that float and int were involved in the operation

In [None]:
5.5 * 2  # Multiplication

In [None]:
5.5 ** 2  # Squaring

In [None]:
5.5 ** 4  # To the power of 4

In [None]:
np.sqrt(5.5)  # Square root

In [None]:
np.exp(5.5)  # Exponential

In [None]:
np.log(5.5)  # Natural log

#### **1.2.3 Logical Operators (Boolean)**
A logical value is to indicate whether an item/statement is TRUE or FALSE, i.e. a Boolean value. Python provides several operators for performing logical operations. Here's a complete list:

In [None]:
# Basic Logical Operators
True and False # Logical AND results in "False" outcome

In [None]:
True or False # Logical OR results in "True" outcome

In [None]:
not True # Logical NOT returns "False" in the outcome

In [None]:
# Bitwise Logical Operators (for integers). These perform logical operations bit-by-bit:
5 & 3  # & is Bitwise AND, Returns 1 (0101 & 0011 = 0001)

In [None]:
5 | 3  # | is Bitwise OR, Returns 7 (0101 | 0011 = 0111)

In [None]:
5 ^ 3  # ^ is Bitwise XOR, Returns 6 (0101 ^ 0011 = 0110)

In [None]:
~5  # ~ is Bitwise NOT, Returns -6 (inverts all bits)

*Comparison* Operators return True/False (Boolean) based on the following operators:

== - Equal to

!= - Not equal to

\> - Greater than

< - Less than

\>= - Greater than or equal to

<= - Less than or equal to

In [None]:
# Logical comparisons Examples
5 == 5  # Does 5 equal 5?

In [None]:
5 == 2  # Does 5 equal 2?

In [None]:
5 != 2  # Does 5 not equal 2?

In [None]:
5 > 2  # Is 5 greater than 2?

In [None]:
5 >= 2  # Is 5 greater than or equals to 2

In [None]:
5 < 2  # Is 5 less than 2?

In [None]:
5 <= 2  # Is 5 less than or equals to 2

In [None]:
# Logical operations combining logical operators and comaprison operators
(5 > 2) and (5 > 4)

In [None]:
(5 > 2) and (5 < 4)

In [None]:
(5 > 2) or (5 < 4)

In [None]:
# Identity Operators
# is - True if same object
x is y # x and y are different, "Flase" is returned

In [None]:
x is not y # 'is not' - True if different objects

In [None]:
# Membership Operators
a = [1,2,3]
b = 2
b in a # in - True if value found in sequence

In [None]:
b not in a # 'not in' - True if value not found in sequence

#### **1.2.4 Strings (Textual data): str**
A character in Python is represented as a **string** value *(str)*, defined using quotes (" " or ' '). A string can contain numbers (e.g. "1", "2", "3"), letters (e.g. "a", "b", "C"), symbols (e.g. "#", "$", "@"), or a combination of these, such as words or sentences. It’s important to note that if a number is stored as a string, you cannot perform mathematical operations on it directly. For example, "5" is a string, not a number, so trying to add it to an integer will result in an error. To use it in calculations, you must first convert it to a numeric type using functions like int() or float().

In [None]:
# String examples
'a'

In [None]:
"abc123"

In [None]:
'apples'

In [None]:
'I hate apples'

In [None]:
# Multiple strings as separate commonds (separated by ;) in one cell, Only prints the last literal
'a'; 'abc' ; 'apples'; 'I hate apples'

In [None]:
# Using 'print' command if you want to display multiple objects in the same cell
print('a'); print("abc") ; 'apples'; 'I hate apples'

---
### **1.3 Data Structures**
Data structures are tools for holding multiple values. In data analysis, one typically works with groups of numbers and/or characters, and rarely with single values one-at-a-time. Data structures allows us to store and manipulate multiple or groups of values simultaneously. The fundamental data structures in Python that we will discuss here include lists, arrays, categorical data (levels), matrices, and data frames.

#### **1.3.1 Creating sequences (Lists and arrays)**
A sequence *list* can be a collection of elements of the same or different data types.

In [None]:
[]  # A null (empty) list

In [None]:
a = [1, 2, 3]  # A numeric list
a

In [None]:
['a', 'b', 'c']  # A character list

In [None]:
[True, False, False]  # A logical list

Let us create a *list* containing the integers from 1 to 5.

In [None]:
# Create a list from 1 to 5
a = list(range(1, 6))
a

We use square brackets, i.e. [.], if we wish to access particular element(s) within the list, rather than the whole list. Remember, Python is 0-indexed, which means that the first element in the list is indexed as 0.

In [None]:
# Accessing elements in a list (Python is 0-indexed)
a[1]  # 2nd element

In [None]:
a[2]  # 3rd element

In [None]:
a[2:5]  # 3rd to 5th element

In [None]:
[a[1], a[4]]  # 2nd and 5th elements

In [None]:
z = [1 , 'a', True]; z # list of different data types

In [None]:
type(z) # confirming the data type of z

In [None]:
# Create a new list b containing the integers 6, 7, 8, 9, 10
b = list(range(6, 11))
b

In [None]:
# Combine lists a and b
c = a + b
c

In [None]:
c = a * 4; c # Appending the list 'a' four times and assigning it to 'c', then display 'c'

**Pro Tip:** The variable name simply points to an object in memory. This affects mutability:

In [None]:
a = [1, 2, 3]
b = a        # b points to the same list as a
b.append(4)
a            # Display a, Output: [1, 2, 3, 4]

---
We have to convert list in to **arrays**, if we have to perform numerical operations on them. Arrays are *ordered collections* of items (usually of the same type) stored in *contiguous memory*. Unlike lists, arrays are optimized for numerical operation.

In [None]:
# Element-wise operations using numpy arrays
a_np = np.array(a)
b_np = np.array(b)
a_np * 0.25  # Multiply elements of a by 0.25

In [None]:
a_np + b_np  # Add elements of a and b

In [None]:
b_np - a_np  # Subtract elements of a from b

In [None]:
a_np * b_np  # Multiply elements of a and b

In [None]:
b_np / a_np  # Divide elements of b by elements of a

#### **1.3.2 Categorical Data in Python using Pandas**
Categorical data in pandas refers to a data type used for variables that take on a limited, fixed set of possible values (categories). Examples include:

Gender: ['Male', 'Female', 'Other']

Product Categories: ['Electronics', 'Clothing', 'Furniture']

Survey Ratings: ['Low', 'Medium', 'High']

In [None]:
f1 = pd.Categorical([1, 2, 3, 4, 5])
f1

In [None]:
f2 = pd.Categorical(['Male', 'Female', 'Female', 'Male', 'Female'])
f2

In [None]:
f3 = pd.Categorical(['L', 'M', 'H', 'H', 'M', 'L'])
f3

By default, categories follow lexicographical (alphabetical) order when created. However, the order can be explicitly defined if needed.

In [None]:
# Reorder the levels of f3
f3 = pd.Categorical(f3, categories=['L', 'M', 'H'], ordered=True)
f3

#### **1.3.3 Matrices**
A matrix is a two-dimensional rectangular array of numbers, symbols, or expressions arranged in rows and columns. It is a fundamental structure in linear algebra, widely used in mathematics, physics, engineering, and data science (e.g., machine learning, image processing).

We can create matrix using 'arrange' function for numPY package.

In [None]:
# Create a 3x3 matrix, filled column-wise (default in numpy)
Mat_A = np.arange(1, 10).reshape((3, 3), order='F')
Mat_A

In [None]:
# Create a 3x3 matrix, filled row-wise
Mat_B = np.arange(1, 10).reshape((3, 3), order='C')
Mat_B

In [None]:
# Create a 3x3 matrix, column-wise (explicit)
Mat_A = np.arange(1, 10).reshape((3, 3), order='F')
Mat_A

Another way to create a matrix, is by **binding multiple arrays** of the same length together. We can either bind the vectors together as column arrays using the **np.column_stack(.)** command, or as row arrays using the **np.row_stack(.)** command.

In [None]:
# Bind arrays as columns and rows
a1 = np.array([1, 2, 3])  # Array 1
a2 = np.array([4, 5, 6])  # Array 2
a3 = np.array([7, 8, 9])  # Array 3
Mat_A = np.column_stack((a1, a2, a3))  # as columns
Mat_A

In [None]:
Mat_B = np.row_stack((a1, a2, a3))  # as rows
Mat_B

The elements of two or more matrices can be multiplied together only if they have the same dimension. Since Mat.A and Mat.B are both 3 × 3 matrices, we can multiply them by one another. Note that the multiplication is done element-to-element. In this instance, A × B = B × A.

In [None]:
# Element-wise multiplication
Mat_A * Mat_B

In [None]:
# Element-wise multiplication
Mat_B * Mat_A

However in matrix multiplication, say A × B, the number of columns in matrix A must equal the number of rows in matrix B, and each row of matrix A are multiplied by each column of matrix B, and summed. Click here for a Khan Academy online tutorial in manual matrix multiplication. Matrix multiplication in Python is performed using the @ command. In matrix multiplication, A × B ≠ B × A.

In [None]:
# Matrix multiplication
Mat_A @ Mat_B  # A matrix multiply B

In [None]:
Mat_B @ Mat_A  # B matrix multiply A

To access the elements of a matrix, say A, we need to specify the corresponding row(s) and columns(s) within the square brackets, i.e. A[row(s),column(s)]. If the rows are not specified, but the columns are, then all the rows will be extracted and vice versa.

Here are some examples.

In [None]:
# Accessing elements in a matrix (Python is 0-indexed)
Mat_A[1, 2]  # element in row 2, column 3

In [None]:
Mat_A[0:2, 2]  # rows 1 and 2, column 3

In [None]:
Mat_A[0, 1:3]  # row 1, columns 2 and 3

In [None]:
Mat_A[0, :]  # all elements in row 1

In [None]:
Mat_A[:, 2]  # all elements in column 3

In [None]:
Mat_A[:, [0, 2]]  # all elements in columns 1 and 3

In [None]:
Mat_A[:, 1:]  # all data except column 1

#### **1.3.4 DataFrame**
A DataFrame is a two-dimensional, tabular data structure (like a spreadsheet or SQL table) provided by the pandas library. It organizes data in rows and columns, where each column can hold a different data type (e.g., numbers, strings, dates). However, all columns must have the same length, and the elements within each column must be of the same type. DataFrames are the primary tool for data manipulation and analysis in Python.

In [None]:
# Create vectors for a DataFrame
Name = ['John', 'Sarah', 'Zach', 'Beth', 'Lachlan']  # Name - Character vector
Age = [35, 28, 33, 55, 43]  # Age - Numeric vector
Gender = pd.Categorical(['Male', 'Female', 'Male', 'Female', 'Male'])  # Gender - factor

Now we can combine the three vectors to form a data frame.

In [None]:
# Create a DataFrame
df = pd.DataFrame({'Name': Name, 'Age': Age, 'Gender': Gender})
df

We can add new columns(s) to an existing data frame by using either the direct assignment or pd.concat(.) command.

In [None]:
# Add new column to the DataFrame (Method 1)
Coffee_Drinker = [True, True, False, True, False]  # Drinks coffee? - logical vector
df1 = df.copy()
df1['Coffee_Drinker'] = Coffee_Drinker
df1

In [None]:
# Add new column to the DataFrame (Method 2)
df2 = pd.concat([df, pd.Series(Coffee_Drinker, name='Coffee_Drinker')], axis=1)
df2

Data frames are accessed in the same way as matrices.

In [None]:
# Accessing rows and columns in DataFrame
df.iloc[[0], 0:3]  # 1st row, columns 1-3

In [None]:
df.iloc[1:3, :]  # rows 2 and 3, all columns

In [None]:
df.iloc[:, [0, 2]]  # all rows, columns 1 and 3

In [None]:
# Accessing columns by name
df['Name']

In [None]:
df[['Name', 'Gender']]

In [None]:
# Access and display columns (Another way)
df.Name

In [None]:
df.loc[:, 'Age']

In [None]:
df.iloc[:, 2]  # All rows, Third column (position 2)

In [None]:
# Add new variables to df and display
df['Coffee_Drinker'] = Coffee_Drinker
df

In [None]:
df['Diabetes'] = pd.Categorical(['Yes', 'No', 'No', 'No', 'Yes'])
df

We can oberseve the structure of the DataFrame using info(.) command.

In [None]:
# Structure of DataFrame
df.info()

In [None]:
# Convert between DataFrame and 'Dict'
df3 = df.to_dict(); df3 # as data frame (dict)

In [None]:
df4 = pd.DataFrame(df3); df4  # as DataFrame

---
#### **1.3.5 Advance lists operations**
List can be a collecion of multiple Data Structures. For example:

In [None]:
# Lists in Python (can contain different types)
list1 = [c, Mat_A, df]
list1

In [None]:
# Examine the structure of the list and its components
for i, item in enumerate(list1):
    print(f'Component {i+1} type: {type(item)}')

In [None]:
# Access list components
list1[0]  # vector c

In [None]:
list1[1]  # matrix Mat_A

In [None]:
list1[2]  # data frame df

In [None]:
# Create a dictionary (named list) in Python
list1 = {'VecC': c, 'MatA': Mat_A, 'DatFrame': df}
list1

In [None]:
# Examine the structure of the dictionary
for k, v in list1.items():
    print(f'{k}: {type(v)}')

In [None]:
# Access dictionary components
list1['VecC']  # vector component

In [None]:
list1['MatA']  # matrix component

In [None]:
list1['DatFrame']  # data frame component

---
### **1.4 Type Conversion**

Python allows implicit and explicit type conversions. However, Python typically avoids implicit conversion between non-numeric types.

#### **1.4.1 Explicit Type Conversion**

In [None]:
int("42")        # String → Integer → 42

In [None]:
float(True)      # Boolean → Float → 1.0

In [None]:
str(3.14)        # Float → String → "3.14"

In [None]:
list((1, 2))     # Tuple → List → [1, 2]

#### **1.4.2 Implicit Type Conversion**

General Rule: bool → int → float → complex

In [None]:
True + 5      # bool→int → 6

In [None]:
3 + 5.0       # int→float → 8.0

In [None]:
2.5 + 3j      # float→complex → (2.5+3j)

In [None]:
mixed1 = (1 , 'a'); mixed1 # integer and string

In [None]:
type(mixed1)

In [None]:
list_mixed2 = [True, 'a']  # logical value coerced to string in R, but not in Python
list_mixed2

In [None]:
list_mixed3 = [True, 1]  # logical value is coerced to numeric in R, in Python True==1
list_mixed3

In [None]:
# All elements are coerced to string in numpy array if types differ
nparray = np.array([5, False, 4.6, 'No']); nparray