<a href="https://colab.research.google.com/github/MarvelousAlex/Python_Data_Analysis/blob/main/21_Classes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classes

## Python Objects

### Notes

* Python is an object oriented programming language.
* Almost everything is an object, with properties and methods.
* A **class** is like an object constructor.

In our course we'll briefly go over classes, but it won't be covered in detail.

## Simple Example

### Create Class

We're creating a class called `LukesList`, don't worry about the code following that.
- Ths class operates very similiar to a `list` object

In [31]:
class WendyList:
    def __init__(self) -> None:
        """Initialize an empty list."""
        self._items = []

    def add(self, item):
        """Add an item to the end of the list."""
        self._items.append(item)

    def __getitem__(self, index):
        """Retrieve an item by index."""
        return self._items[index]

    def __setitem__(self, inedx, value):
        """Set an item at a specific index."""
        self._items[index] = value

    def __repr__(self):
        """Return a string representation of the list."""
        return str(self._items)

    def __len__(self):
        """Return the length of the list."""
        return len(self._items)

### Create Instance

#### Notes

* We can use the class to create objects to create an instance of it by calling: `LukesList()`

In [32]:
my_list = WendyList()

my_list

[]

In [33]:
my_list.add("Data Nerd")
my_list.add("Finance Nerd")

my_list

['Data Nerd', 'Finance Nerd']

We can even use functions like `len()` on it:

In [34]:
len(my_list)

2

## Practical Example

### Demonstration

Recall back that we had built functions to automate calculating things about a salary.

In [35]:
def calculate_salary(base_salary, bonus_rate = .1):
    """
    Calculate the total salary based on the base salary and bonus rate.

    Args:
        base_salary (float): The base salary.
        bonus_rate (float): The bonus rate. Default is .1.

    Returns:
        float: The total salary.
    """
    return base_salary * (1 + bonus_rate)

def calculate_bonus(total_salary, base_salary):
    """
    Calculate the bonus rate based on the total salary and base salary.

    Args:
        total_salary (float): The total salary.
        base_salary (float): The base salary.

    Returns:
        float: The bonus rate.
    """
    return (total_salary - base_salary) / base_salary

In [36]:
class BaseSalary:
    def __init__(self, base_salary, bonus_rate = .1, symbol = "$") -> None:
        self.base_salary = base_salary
        self.bonus_rate = bonus_rate
        self.symbol = symbol
        self.total_salary = base_salary * (1 + bonus_rate)
        self.bonus = self.total_salary - self.base_salary

    def __repr__(self):
        return f"{self.symbol}{self.base_salary:,.0f}"

    def show_salary(self):
        return f"{self.symbol}{self.total_salary:,.0f}"

    def show_bonus(self):
        return f"{self.symbol}{self.bonus:,.0f}"

We can have it print out the salary formatted correctly.

In [37]:
salary = BaseSalary(100_000)

salary

$100,000

In [38]:
salary.show_salary()

'$110,000'

In [39]:
salary.show_bonus()

'$10,000'

## `__init__()` function

### Notes

* All classes have a function called `__init__()`, it's always executed when a class is being initiated.
* Use the `__init__()` function to assign values to object properties, or other operations that are necessary when the object is created.


### Example

Create a class named `DataScienceJobList`, use the `__init__()` function to assign values for `jobs`.

In [40]:
class DataScienceJobsList:
    def __init__(self, jobs) -> None:
        self.jobs = jobs

Example of how to use this class: we'll create a list of data science jobs called `data_science_jobs`.

In [41]:
data_science_jobs = [
    {'job_title': 'Data Scientist', 'job_skills': "Python, SQL, Machine Learning"},
    {'job_title': 'Data Analyst', 'job_skills': "SQL, Excel, Python"},
    {'job_title': 'Machine Learning Engineer', 'job_skills': "Python, TensorFlow, Keras"}
]

Then we will assign this class to an object called `jobs_list` and return `jobs_list`.

In [42]:
job_list = DataScienceJobsList(data_science_jobs)

job_list

<__main__.DataScienceJobsList at 0x7846d193bd40>

Okay if we try to print or call this object it just displays: `<__main__.DataScienceJobsList at 0x7846d1927710>` which isn't very useful. So we'll use a function called `__str__()` to output something better.

## `__str__()` function

### Notes

* `__str__()` function shows what should be returned when the class object is represented as a string.
* If it's not set, the string representation of the object is returned (like it is in the example above).

### Example

Create a class named `DataScienceJobList`, that has:
* What we did before:
    * The `__init__()` function to assign values for `jobs` (what we created in the last example).
* Now:
    * A `__str__()` function to print out the data science jobs.

Note: Each time we create/add a new method, we have to redefine the whole class in another cell.

In [43]:
class DataScienceJobsList:
    def __init__(self, jobs) -> None:
        '''
        Initializes the DataScienceJobsList object with a list of jobs.
        '''
        self.jobs = jobs

    def __str__(self) -> str:
        '''
        Returns a string representation of the data science jobs list.
        '''
        jobs_str = 'Data Science Jobs:\n'

        for job in self.jobs:
            # Assuming job_skills is initially a string; it will be split later
            jobs_str += f"- {job['job_title']}: {job['job_skills']}\n"

        return jobs_str

Then we will assign this class to an object called `jobs_list` and print `jobs_list`. Because of the `__str__()` function it will now **print out what we told it to instead of the string representation.**

In [44]:
job_list = DataScienceJobsList(data_science_jobs)

print(job_list)

Data Science Jobs:
- Data Scientist: Python, SQL, Machine Learning
- Data Analyst: SQL, Excel, Python
- Machine Learning Engineer: Python, TensorFlow, Keras



## Object Methods

### Notes

* Object can also contain methods
* Methods in objects are functions that belong to the object

### Example

Create a class named `DataScienceJobList`, that has:
* What we created before:
    * The `__init__()` function to assign values for `jobs`
    * A `__str__()` function to print out the data science jobs
* Now:
    * A `split_skills` method that converts the job skills to a list

In [45]:
class DataScienceJobsList:
    def __init__(self, jobs):
        self.jobs = jobs

    def __str__(self):
        jobs_str = 'Data Science Jobs:\n'
        for job in self.jobs:
            jobs_str += f"- {job['job_title']}: {', '.join(job['job_skills'])}\n"
        return jobs_str

    def split_skills(self):
        for job in self.jobs:
            job['job_skills'] = job['job_skills'].split(', ')

Then we will assign this class to an object called `jobs_list` and call the `split_skills()` method.

In [46]:
jobs_list = DataScienceJobsList(data_science_jobs)

jobs_list.split_skills() # Ensure this is called to split the skills into lists

print(job_list)

Data Science Jobs:
- Data Scientist: ['Python', 'SQL', 'Machine Learning']
- Data Analyst: ['SQL', 'Excel', 'Python']
- Machine Learning Engineer: ['Python', 'TensorFlow', 'Keras']



## Extras

These are extra examples.

### Printing Class

What if we tried printing this class now?

🐛 **Debugging**

**These are intentional mistakes**

This is used to demonstrate debugging.

Error: This code will return an incorrect output than what we want.

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [47]:
class DataScienceJobsList:
    def __init__(self, jobs):
        self.jobs = jobs

    def __str__(self):
        jobs_str = 'Data Science Jobs:\n'
        for job in self.jobs:
            jobs_str += f"- {job['job_title']}: {', '.join(job['job_skills'])}\n"
        return jobs_str

    def split_skills(self):
        for job in self.jobs:
            job['job_skills'] = job['job_skills'].split(', ')

In [48]:
jobs_list = DataScienceJobsList(data_science_jobs)
print(jobs_list)

Data Science Jobs:
- Data Scientist: Python, SQL, Machine Learning
- Data Analyst: SQL, Excel, Python
- Machine Learning Engineer: Python, TensorFlow, Keras



Oh this doesn't look correct. It's because the `__str__()` method attempts to join the job['job_skills'] assuming it's a list, but at the point where __str__() is called, the skills haven't been split into lists yet; they're still strings.

The method is attempting to join the individual characters of the string, instead of splitting the string into a list of skills first.

We'll fix it by adjusting the `__str__()` method to handle cases where either:

* `split_skills` is called before any attempts to print the object.
* Adjust the `__str__()` method where skills might still be a single string or have already split into a list.

In [51]:
class DataScienceJobsList:
    def __init__(self, jobs):
        self.jobs = jobs

    def __str__(self) -> str:
        jobs_str = 'Data Science Jobs:\n'

        for job in self.jobs:
            if isinstance(job['job_skills'], list):
                skills_str = ', '.join(job['job_skills'])
            else:
                skills_str = job['job_skills']
            jobs_str += f"- {job['job_title']}: {skills_str}\n"

        return jobs_str

    def split_skills(self):
        for job in self.jobs:
            job['job_skills'] = job['job_skills'].split(', ')

Now the `__str__()` method will correctly handle `job['job_skills']` whether it's a pre-split list or a string that hasn't been split yet.

In [52]:
jobs_list = DataScienceJobsList(data_science_jobs)

print(jobs_list)

Data Science Jobs:
- Data Scientist: Python, SQL, Machine Learning
- Data Analyst: SQL, Excel, Python
- Machine Learning Engineer: Python, TensorFlow, Keras



### Add in Method

While the `__str__()` method is good for providing a quick and readable description of the object's state. We should create a method to explicitly called to print the details of each job to the console.

Explanation:

* It iterates through the list of jobs, prints the job title.
* Then iterates through the list of required skills, printing each one.
* Remember it's not returning a value, instead it directly outputs to the console.

🐛 **Debugging**

**These are intentional mistakes**

This is used to demonstrate debugging.

Errors: This is either because we've called `split_skills()` more than once on the same `DataScienceJobsList` instance, or the initial data passed to __init__ already had the skills as lists for some entries.

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [53]:
class DataScienceJobsList:
    def __init__(self, jobs):
        self.jobs = jobs

    def __str__(self):
        jobs_str = 'Data Science Jobs:\n'
        for job in self.jobs:
            # Check if job_skills is already a list or still a string
            if isinstance(job['job_skills'], list):
                skills_str = ', '.join(job['job_skills'])
            else:
                skills_str = job['job_skills']
            jobs_str += f"- {job['job_title']}: {skills_str}\n"
        return jobs_str

    def split_skills(self):
        for job in self.jobs:
            job['job_skills'] = job['job_skills'].split(', ')

    def display_jobs(self):
        for job in self.jobs:
            print(f"Job Title: {job['job_title']}")
            print("Required Skills:")
            for skill in job['job_skills']:
                print(f"- {skill}")
            print()  # Adds an empty line for better readability

In [54]:
jobs_list = DataScienceJobsList(data_science_jobs)

jobs_list.split_skills()
jobs_list.display_jobs()

AttributeError: 'list' object has no attribute 'split'

Solution: Modify the split_skills method to check if `job['job_skills']` is a string before attempting to split it. Now, if `split_skills()` is called multiple times, it won't attempt to split already split lists, thus avoiding the error.

In [55]:
class DataScienceJobsList:
    def __init__(self, jobs):
        self.jobs = jobs

    def __str__(self):
        jobs_str = 'Data Science Jobs:\n'
        for job in self.jobs:
            if isinstance(job['job_skills'], list):
                skills_str = ', '.join(job['job_skills'])
            else:
                skills_str = job['job_skills']
            jobs_str += f"- {job['job_title']}: {skills_str}\n"
        return jobs_str

    def split_skills(self):
        for job in self.jobs:
            if isinstance(job['job_skills'], str):  # Check if job_skills is a string
                job['job_skills'] = job['job_skills'].split(', ')

    def display_jobs(self):
        for job in self.jobs:
            print(f"Job Title: {job['job_title']}")
            print("Required Skills:")
            for skill in job['job_skills']:
                print(f"- {skill}")
            print()  # Adds an empty line for better readability

It works now 🙌

In [56]:
jobs_list = DataScienceJobsList(data_science_jobs)
jobs_list.split_skills()
jobs_list.display_jobs()

Job Title: Data Scientist
Required Skills:
- Python
- SQL
- Machine Learning

Job Title: Data Analyst
Required Skills:
- SQL
- Excel
- Python

Job Title: Machine Learning Engineer
Required Skills:
- Python
- TensorFlow
- Keras



That's it for classes. Now onto an exercise and then we'll dive into actual data analysis.