# Definitions


-**Objects**: a data record with fields; an instance of a class. Everything in Python is an object

In [None]:
type(1000)

int

1000 is an instance of the class 'int', therefore this is an object. "Hello World" is an instance of the class 'str' therefore this is an object.

-**Variables**: defined names that references an object. Acts as a data store.

In [None]:
job_title = "Data Analyst"
job_location = "United Kingdom"
job_salary = 100000

You can also assign a function to a variable. For instance:

In [None]:
my_print_func = print
my_print_func("Hi there")

Hi there


-**Functions**: a reusable piece of code that performs a specific task. In order to run a function, you need () to input the argument. An argument is the value passed to the function when its called.

In [None]:
print("What is up!")

What is up!


This is an example of a built-in function. We can also build novel functions using 'def'

In [None]:
def greet():
  return "What is up!"


greet()

'What is up!'

-**Classes**: a template for creating objects (records). Remember, 1000 is an instance of the class 'int'.

-**Attributes**: the variables of an object (a field in a record) defined by its class. Use the dot operator after calling an instance of a class to access attributes and methods

In [None]:
class JobPost:
  def __init__(self, title, location, salary):
    self.title = title
    self.location = location
    self.salary = salary


job_1 = JobPost(job_title, job_location, job_salary)
job_1.title

'Data Analyst'

-**Methods**: the functions defined inside a class that operates on its objects. Remember to add () for methods

In [None]:
job_salary.__add__(20)

100020

# Strings & Strings Formatting

Here are some of the following built-in methods for the str class. To observe methods, type 'help(str)'

In [None]:
skill = "Python"
skill.upper()

'PYTHON'

In [None]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

Why do we have upper(self, /)? These are the arguments we pass to the function.

If we were to write the exteneded version, it would be: str.lower("Python")

This in in the format class.method("instance of str class"). Therefore, skill.upper() automatically places that self as the skill variable, which is equal to Python

Note: the forward / is ignored. Self is a positional argument (which is to the left) and keyword arguments are to the right of / (so we have to specify keyword arguments). In addition, functions may have more than one positional argument

In [None]:
skill.replace("P", "J")

'Jython'

In [None]:
job_title.split(sep=" ", maxsplit=1)

['Data', 'Analyst']

.split is an example of a keyword argument so it is important to define arguments.

There are also magic methods denoted by a double underscore before and after the method name. These include: __add_, __contains_, __eq_ etc.

In [None]:
str.__add__("Data", " Scientist")

'Data Scientist'

In [None]:
skill.__len__()

6

In [None]:
len("Python")

6

For concatenating strings, we have numerous operations we can use to create the string Role: Data Analyst:
- .format() method
- Formatted string literals
-% formatting (printf- style string formatting)
- .join method (argument needs to be an iterable)

Note: An iterable is any Python object capable of returning its members one at a time, permitting it to be iterated over in a for-loop. Familiar examples of iterables include lists, tuples, and strings

In [None]:
"Role: " + job_title

'Role: Data Analyst'

In [None]:
"Role: {}; Skill Required: {}".format(job_title, skill)

'Role: Data Analyst; Skill Required: Python'

In [None]:
f"for this job, you will need to be: a {job_title} and have {skill} knowledge"

'for this job, you will need to be: a Data Analyst and have Python knowledge'

In [None]:
"Role: %s" %(job_title)

'Role: Data Analyst'

In [None]:
'|'.join(skill)

'P|y|t|h|o|n'

# Conditional Statements

These statements evaluate whether a condition is true or false and then executing code based on that. These conmprise of:

- if: first condition is met
- elif: subsequent condition is met
- else: fallback if no conditions are met

Note: code is only executed if a condition is true

In [None]:
if True:
  print("It is true")

if False:
  print("It is false")


It is true


In [None]:
applicant_skill = "Python"
years_experience = 6

if skill == applicant_skill:
  print("You are eligibile")
elif years_experience>4:
  print("Enoough experience to apply")
else:
  print("You need Python knowledge")

You are eligibile


In [None]:
applicant_skill = "SQL"
years_experience = 6

if skill == applicant_skill:
  print("You are eligibile")
elif years_experience>4:
  print("Enough experience to apply")
else:
  print("You need Python knowledge")

Enough experience to apply


# Logical Operators & Bitwise Operators

# Lists


Lists are used for the collection of ordered items. Denoted in square brackets and contain any data type within it (sequence data type). We have methods such as:

- append: add an additional item to the end of a list
- remove: removes the first occurrence of a given value

Note: lists are mutable objects, meaning they can be changed after creation

In [None]:
job_skills = ['sql', 'tableau', 'excel']
job_skills.append('python')
print(job_skills)

['sql', 'tableau', 'excel', 'python']


In [None]:
job_skills.remove('tableau')
print(job_skills)

['sql', 'excel', 'python']


In [None]:
help(list)

Help on class list in module builtins:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self))

In the job_skills list, items have an assigned index. The first item is assigned an index 0, the second item 1, the third item 2 etc.

We can access list specific items by typing the variable name followed by square brackets and inserting the corresponding index. Therefore, we can add items at specific places using the insert() method

In [None]:
job_skills[1]

'excel'

In [None]:
job_skills.insert(1, "powerbi")
job_skills

['sql', 'powerbi', 'excel', 'python']

The slicing syntax is important for accessing items in a list.

- Syntax: list[start:end:step]
- start: starting index (inclusive). By default it is 0
- end: ending index (exclusive). By default it is the last index
- step: steps to take between items. By default it is 1

In [None]:
job_skills[0:4:2]

['sql', 'excel']

Unpacking is the last key concept.  This involves assigning each of the values in an iterable, like a list, to its own variable in a single statement

In [None]:
job_skills

['sql', 'powerbi', 'excel', 'python']

In [None]:
skill1, skill2, skill3, skill4 = job_skills
print(skill1)
print(skill2)
print(skill3)
print(skill4)

sql
powerbi
excel
python


To unpack specific items on that list, we can use the unpack * operator. This ensures that multiple items (may not be important) are assigned to one variable. The first skill in the list sql is assigned to the first variable and the rest are assigned to the 'unwanted' variable.

In [None]:
key_skills, *unwanted= job_skills
print(key_skills)
print(unwanted)

sql
['powerbi', 'excel', 'python']


# Dictionaries


Dictionaries are used to store data values in key:value pairs. Dictionaries are written with curly brackets {} , and have key:value pairs. This is a mapping data type.

A dictionary is a collection which is ordered and mutable. Values can be duplicated, however keys cannot. Also, keys must be hashable (its value does not change during its lifetime).

In [None]:
tech_skills={
    'database':'postgres',
    'language': 'python',
    'library': 'pandas'
}
print(tech_skills)

{'database': 'postgres', 'language': 'python', 'library': 'pandas'}


Common methods that are used for the class 'dict' include:
- getitem magic method. x.__ getitem__(y) where x is the dict name, y is the key
- keys() method used to obtain the keys of a dictionary in the format x.keys()
- values() method used to obtain the values of a dictionary in the format x.values()
- pop() method to remove a particular value by inserting the key in the format x.pop(key)
- update() method is used to add key:value pairs to the end of a dictionary

In [None]:
tech_skills.__getitem__('database')

'postgres'

In [None]:
tech_skills.keys()

dict_keys(['database', 'language', 'library'])

In [None]:
tech_skills.values()

dict_values(['postgres', 'python', 'pandas'])

In [None]:
tech_skills.pop('library')
print(tech_skills)

{'database': 'postgres', 'language': 'python'}


In [None]:
tech_skills.update({'cloud': 'googe cloud'})
print(tech_skills)

{'database': 'postgres', 'language': 'python', 'cloud': 'googe cloud'}


# Sets/Tuples


Sets are similar to lists, but these are defined with curly brackets {}. These are mutable but are unqiue in the fact that if you were to convert a list to a set, it removes items that are duplicated.

Therefore, they are designed for unique items, they are unordered and there is no indexing available with it. This is mostly used to extract unique values by using the set method, then reconverting this back to a list with the list method (2:04:58)

In [None]:
job_skills.append("sql")
job_skills

['sql', 'powerbi', 'excel', 'python', 'sql']

In [None]:
set(job_skills)

{'excel', 'powerbi', 'python', 'sql'}

Tuples are defined by () but unlike lists these are immutable- you cannot add, remove, replace items. However, they are indexable.

Therefore, the items are fixed- applied in software engineering

In [None]:
lukes_skills = ('python', 'sql', 'statistics', 'tableau')
lukes_skills

('python', 'sql', 'statistics', 'tableau')

# Loops

Looping means repeating something over and over until a particular condition is satisfied. In Python, we have the:
- for loop
- while loop

Note: DRY (don't repeat yourself) is a key principle in programming.

In [None]:
number_list= [1,4,6,3,2,7,5,6,3,9,0,3,2]
x=0

for n in number_list:
  if n == 3:
    x+=1

print(x)

3


In the code above, we use a for loop to iterate through each of the numbers in the list and it counts the number of times 3 appears on the list. 'n' is the variable, 'number_list' is the iterable.

We can use loops to determine all the data analyst jobs in a job_list and then use the len() function to determine the number of analyst-specific jobs.

We can also use for loops in dictionaries. In the example below, we have a dictionary containing key:value pairs of applicant name:years of experience and we are looking to hire applicants with more than 5 years of experience.

In [None]:
yrs_experience = {
    'Luke': 3,
    'Kelly': 6,
    'Ken': 4,
    'Alex': 3
}

for n in yrs_experience:
  print(n)

Luke
Kelly
Ken
Alex


To get the key:value pairs out, we need to use the items method

In [None]:
yrs_experience = {
    'Luke': 3,
    'Kelly': 6,
    'Ken': 4,
    'Alex': 3
}

for key, value in yrs_experience.items():
  print(key, value)

Luke 3
Kelly 6
Ken 4
Alex 3


We can then say if the value is greater than 5 years of experience, we add the key of that corresponding value to the eligible applicants list.

Note: an empty list[] is required to initialize a list object so we can append to it.

In [None]:
yrs_experience = {
    'Luke': 3,
    'Kelly': 6,
    'Ken': 4,
    'Alex': 3
}

eligibile_applicants= []
for key, value in yrs_experience.items():
  if value>5:
    eligibile_applicants.append(key)

print(eligibile_applicants)

['Kelly']


The range function is also used in for loops to iterate through a specified number of times.

In [None]:
range(6)

range(0, 6)

In [None]:
for x in range(10):
  print(x)

0
1
2
3
4
5
6
7
8
9


Next are **while** loops and these loops check on whether a condition is true or false.

While the condition remains true, the code below the loop will continue to execute. With while loops, we need to define an indexing varibale (in the example below, that is count). *

In [None]:
count=2

while count < 6:
  print(count)
  count+=1

2
3
4
5


# List Comprehension

List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list.

For the code below

In [None]:
numbers=[]

for number in range(10):
  numbers.append(number)

print(numbers)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


We can also write it like this

In [None]:
numbers_comp=[x for x in range(10)]
print(numbers_comp)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


numbers_comp=[x for x in range(10)]

syntax: newList = [ expression(element) **for** element **in** oldList **if** condition]

- expression: Represents the operation you want to execute on every item within the iterable.
- element: The term “variable” refers to each value taken from the iterable.
- iterable: specify the sequence of elements you want to iterate through.(e.g., a list, tuple, or string).

In [None]:
new=[x*2 for x in numbers_comp]
print(new)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]


In [None]:
new=[float(x) for x in numbers_comp]
print(new)

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


This is not only limited to list comprehension, but also applies to:
- set comprehension
- tuple comprehension
- dictionary comprehension

Using our previous example that scans analyst jobs in a list, this can be written as a single line.

In [None]:
job_roles= ['Data Engineer', 'Cook', 'Financial Analyst', 'QA Cloud engineer', 'Data Analyst JP', 'Producer']
len(job_roles)

6

In [None]:
analyst_roles= [job for job in job_roles if 'Analyst' in job]
print(analyst_roles)

['Financial Analyst', 'Data Analyst JP']


# Functions/Lambda


Remember, **functions** are a reusable piece of code that performs a specific task. In order to run a function, you need () to input the argument. An argument is the value passed to the function when its called.

The len() function is commonly used to tell us the number of items/pairs in a list or dictionary, the type() function define the data type of a given variable.

There are 5 different types of functions:
- Built-in functions: e.g., print()
- User-defined functions: def my_function():
- Lambda functions: lambda x: x + 1
- Standard library functions (modules): e.g., math.sqrt()
- 3rd party library functions (libraries): e.g., numpy.array()

We can also create user defined functions. For instance, let's say we want to calculate theh total salary, which is the base salasry * (1+ bonus rate), we can create a 'calc' function for this:

In [None]:
def calc(base_salary, bonus_rate):
  total_salary = base_salary * (1+ bonus_rate)
  return total_salary

calc(100000,0.1)

110000.00000000001

Remember to add return on the leftside of the last line, and to the right of it specify the value you want to return.

- Here, base_salary and bonus_rate are parameters which are variables inside parentheses in the function definition and act as placeholders for arguments

- Arguments are values passed to the function when it is called

Note: if we want to add an optional argument, in the parameters, make one of the variables equal to a given value

---



**Lambda (anonymous) functions** can take any number of arguments, but can only have one expression.

Syntax: 'lambda' argument(s) : expression



In [None]:
null = lambda x: x*2
print(null(9))

18


This can also be done in one line of code and you can provide more than one argument

In [None]:
(lambda x, y: x * (y * 3))(3, 7)

63

Lambda functions can be used in conjunction with list comprehension when defining the beginning expression to create a new list. We can use this to:
- calculate the total salary by multiplying a fixed number by the base salaries
- data filtering

In [None]:
raw_salary=[1000, 2000, 3000, 4000, 5000]

upd_salary=[(lambda x: x*1.1)(salary) for salary in raw_salary]
upd_salary

[1100.0, 2200.0, 3300.0000000000005, 4400.0, 5500.0]

# Recursive functions


Recursive functions means a defined function that can call itself (function call is inside the definition of the function).

The function gets called continuously until the condition is fully satisfied, at which point we have achieved the base case (this is defined using an if statement). Very similar concept to looping.

For instance, if we want to write all the positive even number smaller or equal to 8 so we get 8, 6, 4, 2 etc:

In [None]:
def EvenNums(num):
  print(num)
  if num == 2:
    return num
  else:
    return EvenNums(num-2)

EvenNums(10)

10
8
6
4
2


2

In [None]:
def Fibonacci(idx):
  if idx <=1:
    return idx
  else:
    return Fibonacci (idx-1) + (idx-2)

Fibonacci(8)

22

# Modules & packages

A module is a python file containing Python definitions and statements. A module can define functions, classes, and variables. A module can also include runnable code. A Python package usually consists of several modules. Physically, a package is a folder containing modules and maybe other folders that themselves may contain more folders and modules. Conceptually, it’s a namespace. This simply means that a package’s modules are bound together by a package name, by which they may be referenced.

To import a module into your current environment, you type:
import file_name

To call the contents of the module, you type
file_name.method()

In this case, I created a module called 'jobanalyzer.py' and defined a calc function

In [None]:
import jobanalyzer
jobanalyzer.calc(200000)

220000.00000000003

If we are using a few functions from another module, we can type:

- from jobanalyzer import calc, function2, function3

so we no longer require the dot notation. To import all functions in a module we can type:

- from jobanalyzer import *

The reason we would not want to import all functions is because we will run into issues if we define a function that has the same name as the 3rd party function.

Note: you can go to the Python standard library to access and import different sorts of modules. Below we will import a statistics.py module to calculate the mean of a list

In [None]:
salary_list=[ 30486, 98320, 100456, 234512, 32032, 48920, 357689]

In [None]:
from statistics import mean, median, mode

mean(salary_list)


128916.42857142857

In [None]:
median(salary_list)

98320

**Carry out data cleanup exercise tomorrow**

Libraries in Python are collections of modules and packages that provide pre-written code to perform various tasks

# Classes

-**Classes**: a template for creating objects (records). Remember, 1000 is an instance of the class 'int'. Classes are useful when we need to bundle up different methods and make an object behave it a certain way.

Remember, functions inside of a class are known as methods. And class attributes are class variables that are inherited by every object of a class.

The most well-recognized magic method is the ____init ___ method, which enables a class to be initialized with specific attributes. For instance, if we were to devise numerous functions for salary calculations, we can create a class for this:

In [None]:
class BaseSalary:
  def __init__(self):
    pass

BaseSalary(100000)

TypeError: BaseSalary.__init__() takes 1 positional argument but 2 were given

Here, we need to add more attributes next to self as Python thinks there is two positional arguments: BaseSalary (self) and 100000. Hence, we create the base_salary and bonus_rate parameters.

In addition, below the __init__ method, we initialize the attributes base_salary, bonus_rate. If we do not, Python will not recognise these attributes.

Initialization: Below we set the attribute of base salary (self.base_salary) equal to the variable base_salary we are passing in (which is 100000).

In [None]:
class BaseSalary:
  def __init__(self, base_salary):
    self.base_salary = base_salary


salary_calc=BaseSalary(100)
salary_calc


<__main__.BaseSalary at 0x7fcb0682fa60>

In [None]:
class BaseSalary:
  def __init__(self, base_salary):
    self.base_salary = base_salary


salary_calc=BaseSalary(100)
salary_calc.base_salary

100

Here, the attribute of base_salary is equal to 100. Note we do not need to add () for attributes.

The full class would be:

In [None]:
class BaseSalary:
  def __init__(self, base_salary, bonus_rate=0.1, symbol='$'):
    self.base_salary = base_salary
    self.bonus_rate = bonus_rate
    self.symbol = symbol


salary_calc=BaseSalary(100)
salary_calc.symbol

'$'

To change the format of the output, we can use the __ repr__ magic method combined with an f string.

Note:
- for the f-string, we reference the attributes (variables of an object. defined by its class) by typing self.attribute_name
- to format with commas, I add '**: ,**' to the end of the self.base_salary attribute

In [None]:
class BaseSalary:
  def __init__(self, base_salary, bonus_rate=0.1, symbol='$'):
    self.base_salary = base_salary
    self.bonus_rate = bonus_rate
    self.symbol = symbol
    self.total_salary = base_salary * (1 + bonus_rate)


  def __repr__(self):
    return f'{self.symbol}{self.base_salary:,}'

  def salcalc(self):
    return f'{self.symbol}{self.total_salary:,}'


salary_calc=BaseSalary(10000)
salary_calc.salcalc()

'$11,000.0'

Now, we add the rest of the calculations. We simply set the **total_salar**y and **bonus value** as attributes and set them equal to their respective calculations so that we merely reference them under their respective functionsfunctions.

In [None]:
  class BaseSalary:
    def __init__(self, base_salary, bonus_rate=0.1, symbol='$'):
      self.base_salary = base_salary
      self.bonus_rate = bonus_rate
      self.symbol = symbol
      self.total_salary = base_salary * (1 + bonus_rate)
      self.bonus_value = (self.total_salary - self.base_salary)


    def __repr__(self):
      return f'{self.symbol}{self.base_salary:,}'

    def salcalc(self):
      return f'{self.symbol}{self.total_salary:,}'

    def bonuscalc(self):
      return f'{self.symbol}{self.bonus_value:,}'



salary_calc = BaseSalary(100000)
print(salary_calc.salcalc())
print(salary_calc.bonuscalc())

$110,000.00000000001
$10,000.000000000015


Note: in the example above, we simply set the total_salary and bonus_value as attributes and set them equal to their respective calculations so that we simlpy reference them under the functions.

However, if I do not want to do that and rather write the full operation below the function, how can I do this? It would be like this

In [None]:
  class BaseSalary:
    def __init__(self, base_salary, bonus_rate=0.1, symbol='$'):
      self.base_salary = base_salary
      self.bonus_rate = bonus_rate
      self.symbol = symbol

    def __repr__(self):
      return f'{self.symbol}{self.base_salary:,}'

    def calculate_bonus(self):
      return (self.base_salary * (1 + self.bonus_rate))

  salary_calc = BaseSalary(100000)
  salary_calc.calculate_bonus()

110000.00000000001

# Libraries


A Python library is a collection of Python packages or modules that are designed to perform specific tasks. Libraries can make everyday tasks more efficient, and are often used to perform data analysis, machine learning, and other task.

To access these libraries, the first is from PyPi.org (using the' pip install pandas' for Google Colab) or through Anaconda.

- pip is a bash or shell command. We can write '**!pip install pandas**'

- to list all packages in the environment, type '**!pip list**'

Here, we see that the version 2.2.2 of pandas is installed.


Commom 3rd party libraries:
- Pandas: popular for accessing tabular-like data and using dataframes to visualise this
- NumPy:
- Matplotlib: we can take data we want to visualise and transform it into any plot we would like
- Seaborn
- SciPy
- Scikit-learn

In [None]:
!pip list

Package                            Version
---------------------------------- -------------------
absl-py                            1.4.0
accelerate                         0.34.2
aiohappyeyeballs                   2.4.3
aiohttp                            3.10.10
aiosignal                          1.3.1
alabaster                          0.7.16
albucore                           0.0.16
albumentations                     1.4.15
altair                             4.2.2
annotated-types                    0.7.0
anyio                              3.7.1
argon2-cffi                        23.1.0
argon2-cffi-bindings               21.2.0
array_record                       0.5.1
arviz                              0.19.0
astropy                            6.1.4
astropy-iers-data                  0.2024.10.7.0.32.46
astunparse                         1.6.3
async-timeout                      4.0.3
atpublic                           4.1.0
attrs                              24.2.0
audioread        

# Additional topics


- Arrays
- Inheritance
- Iterators
- Polymorphism
- Scope
- Dates
- JSON
- RegEx
- pip
- try...except
- user input

Learnt the rest of these concepts using W3 Schools
