# Restaurant Rating

Description

You work as a data analyst at PacFood, a food delivery service. Your task is to create a class that manages essential details about a restaurant, including its name, cuisine type, ratings, and special offers. The class should allow adding new ratings and calculating the average rating based on the stored ratings.

Instructions:

- Create a class named Restaurant with the following requirements:

    It should have 4 inputs when initialized:
    - name: (str) The name of the restaurant.
    - cuisine: (str) The type of cuisine the restaurant offers.
    - ratings: (list) A list of individual ratings (each rating should be a float between 1 and 5).
    - offers: (list) A list of current offers available at the restaurant.

    Also create 4 instance attributes for each input with similar name.

- Create a method named add_rating that takes a single input:

    - new_rating: (float) The new rating to be added.
    
    It should be added to the ratings list, and the updated list of ratings should be returned.

- Create a method named get_avg_ratings:

    This method should return the average rating of the restaurant.

    If there are no ratings, return 0.0.

In [None]:
# Expected Output:

# Name: Spicy Delight
# Cuisine: Indian
# Initial Ratings: [4.5, 4.0, 5.0]
# Offers: ['20% off on first order', 'Free delivery on orders over $30']
# Updated Ratings after adding 3.0: [4.5, 4.0, 5.0, 3.0]
# Updated Ratings after adding 4.0: [4.5, 4.0, 5.0, 3.0, 4.0]
# Average Rating: 4.1

In [8]:
class Restaurant:
    ### YOUR CODE HERE
    def __init__(self, name, cuisine, ratings, offers):
        self.name = name
        self.cuisine = cuisine
        self.ratings = ratings
        self.offers = offers
        
    def add_rating(self, new_rating):
        self.ratings.append(new_rating)
        return self.ratings
        
    def get_avg_ratings(self):
        return sum(self.ratings) / len(self.ratings)

# Don't change code below
# Membuat objek restaurant
restaurant_x = Restaurant(
    name="Spicy Delight",
    cuisine="Indian",
    ratings=[4.5, 4.0, 5.0],
    offers=["20% off on first order", "Free delivery on orders over $30"]
)

# Print restaurant info
print(f"Name: {restaurant_x.name}")
print(f"Cuisine: {restaurant_x.cuisine}")
print(f"Initial Ratings: {restaurant_x.ratings}")
print(f"Offers: {restaurant_x.offers}")

# Add new ratings and print updated ratings
updated_ratings_1 = restaurant_x.add_rating(3.0)
print(f"Updated Ratings after adding 3.0: {updated_ratings_1}")

updated_ratings_2 = restaurant_x.add_rating(4.0)
print(f"Updated Ratings after adding 4.0: {updated_ratings_2}")

# Get average rating
average_rating = restaurant_x.get_avg_ratings()
print(f"Average Rating: {average_rating:.1f}")

Name: Spicy Delight
Cuisine: Indian
Initial Ratings: [4.5, 4.0, 5.0]
Offers: ['20% off on first order', 'Free delivery on orders over $30']
Updated Ratings after adding 3.0: [4.5, 4.0, 5.0, 3.0]
Updated Ratings after adding 4.0: [4.5, 4.0, 5.0, 3.0, 4.0]
Average Rating: 4.1


# Data Quality Monitoring

Description

You are a data engineer at PacData, and your task is to create a class that monitors data quality for a dataset. This class will be used to check for missing values, duplicates, clean the dataset, and allow new records to be added dynamically, ensuring the data is ready for analysis.

In this task, missing values are defined as any None values or empty strings (""). These should be identified and counted in each record to assess the data quality.

Instructions:

- Create a class named DataQualityMonitor` with the following requirements:

    The class should be initialized without requiring any input.
    It should have one instance attribute, dataset, which is a list to store the records of the dataset. Each record is represented as a dictionary. By default, this attribute should be initialized as an empty list.

- Create a method named add_data:

    This method should take a new_record (dict) as input and append it to the dataset.

- Create a method named check_missing_values:

    This method should iterate through the dataset and count the number of missing values (None or empty strings) for each column.
    It should then generate a report as a dictionary showing the number of missing values for each column and return the dictionary.
    {<column>: <number of missing values>} 

- Create a method named check_duplicates:

    This method should count how many duplicate records exist in the dataset and return the count.

- Create a method named clean_data:

    This method should remove:
    Any records with missing values.
    Any duplicate records.
    After cleaning, it should update the dataset and return the number of remaining records.

In [1]:
# Expected Output:

# Initial dataset: []
# Missing values report: {'id': 0, 'value': 1, 'name': 0}
# Duplicate records found: 1.
# Data cleaned. Remaining records: 2.
# [{'id': 1, 'value': 10, 'name': 'Alice'}, {'id': 3, 'value': 20, 'name': 'Charlie'}]

In [1]:
class DataQualityMonitor:
    def __init__(self):
        self.dataset = []

    def add_data(self, new_record):
        self.dataset.append(new_record)

    def check_missing_values(self):
        missing_report = {}
        for record in self.dataset:
            for key, value in record.items():
                if value is None or value == "":
                    missing_report[key] = missing_report.get(key, 0) + 1
                else:
                    missing_report.setdefault(key, 0)
        return missing_report

    def check_duplicates(self):
        seen = set()
        duplicates_count = 0
        for record in self.dataset:
            record_tuple = tuple(record.items())
            if record_tuple in seen:
                duplicates_count += 1
            else:
                seen.add(record_tuple)
        return duplicates_count

    def clean_data(self):
        cleaned_dataset = [record for record in self.dataset if all(value not in (None, "") for value in record.values())]

        unique_dataset = []
        for record in cleaned_dataset:
            if record not in unique_dataset:
                unique_dataset.append(record)
        self.dataset = unique_dataset

        return len(self.dataset)

# Test Case
# Creating a DataQualityMonitor object with sample data
data_monitor = DataQualityMonitor()

# Displaying initial dataset
print(f'Initial dataset: {data_monitor.dataset}')

# Adding new records
data_monitor.add_data({"id": 1, "value": 10, "name": "Alice"})
data_monitor.add_data({"id": 2, "value": None, "name": "Bob"})
data_monitor.add_data({"id": 3, "value": 20, "name": "Charlie"})
data_monitor.add_data({"id": 1, "value": 10, "name": "Alice"})  # Duplicate record

# Running data quality checks
missing_values = data_monitor.check_missing_values()
duplicates_count = data_monitor.check_duplicates()
remaining_records = data_monitor.clean_data()

# Displaying the results
print(f"Missing values report: {missing_values}")
print(f"Duplicate records found: {duplicates_count}.")
print(f"Data cleaned. Remaining records: {remaining_records}.")
print(data_monitor.dataset)

Initial dataset: []
Missing values report: {'id': 0, 'value': 1, 'name': 0}
Duplicate records found: 1.
Data cleaned. Remaining records: 2.
[{'id': 1, 'value': 10, 'name': 'Alice'}, {'id': 3, 'value': 20, 'name': 'Charlie'}]


# Simple Label Encoder

Description

You are a data scientist at PacAnalytics, and your task is to implement a universal label encoding mechanism that follows specific rules. The encoding should assign the smallest category a value of 0, the next smallest a value of 1, and so on. This implementation will allow you to handle various categorical datasets effectively for your machine learning models, ensuring that categorical variables are converted into numerical values suitable for algorithm processing.

Instructions:
- Define a class named SimpleLabelEncoder that contains the following attributes:

    The class should be initialized without requiring any input parameters.
    It should have 2 attributes:
    encoding: (dict) An initially empty dictionary that will store the label encoding for the categorical values.
    encoded_data: (list) An initially empty list that will store the label-encoded representation of the input data.
- Create a method named fit:

    This method should take a list of strings as input that you want to encode.
    The method should:
    Sort the unique categories alphabetically.
    Assign the smallest category the value of 0, the next smallest 1, and so on, storing these mappings in encoding.
- Create a method named transform:

    This method should take a list of values as input and replace the values with their corresponding label-encoded integers based on encoding.
    The result should be stored in encoded_data, where each entry is the label-encoded value.

In [6]:
# Expected Output:

# Initial Encodings: {}
# Initial Encoded Data: []
# Encodings: {'Basic': 0, 'Premium': 1, 'Standard': 2}
# Encoded Data: [1, 2, 0, 1, 0]

In [1]:
class SimpleLabelEncoder:
    def __init__ (self):
        self.encoding = {}
        self.encoded_data = []
    
    def fit (self, data):
        data_sorted = sorted(set(data))
        for idx, item in enumerate(data_sorted):
            self.encoding[item] = idx

    def transform (self, data):
        for item in data:
            if item in self.encoding:
                self.encoded_data.append(self.encoding[item])
    
encoder = SimpleLabelEncoder()

# Input data to be encoded
# data_to_encode = ['Silver', 'Gold', 'Bronze', 'Silver', 'Bronze', 'Gold']
data_to_encode = ["Premium", "Standard", "Basic", "Premium", "Basic"]

# Displaying initial encoding and encoded data
print(f"Initial Encodings: {encoder.encoding}")
print(f"Initial Encoded Data: {encoder.encoded_data}")

# Fitting the encoder with the data
encoder.fit(data_to_encode)

# Displaying the encoding after fitting 
print(f"Encodings: {encoder.encoding}")

# Transforming the data with the fitted encoder
encoder.transform(data_to_encode)

# Displaying the encoded data after transforming
print(f"Encoded Data: {encoder.encoded_data}")

Initial Encodings: {}
Initial Encoded Data: []
Encodings: {'Basic': 0, 'Premium': 1, 'Standard': 2}
Encoded Data: [1, 2, 0, 1, 0]


# Employee Management

Description

You work as a data analyst at PacCorp, a technology company that manages various types of employees. Your task is to create a class hierarchy that includes a base class Employee and derived classes FullTimeEmployee and PartTimeEmployee. This system will help PacCorp maintain employee records, including their roles and working hours.

Instructions:
- Define a base class named Employee with the following requirements:

    It should have three inputs when initialized:
    name: (str) The name of the employee.
    employee_id: (str) A unique identifier for the employee.
    base_salary: (float) The base salary of the employee.
    Create three instance attributes with similar names.
- Define a derived class named FullTimeEmployee that inherits from Employee and adds the following requirements:

    It should have one additional input when initialized:
    benefits: (list) A list of benefits provided to full-time employees (e.g., health insurance, retirement plan).
    Create an instance attribute to store the benefits.
- Define another derived class named PartTimeEmployee that inherits from Employee and adds the following requirements:

    It should have one additional input when initialized:
    hours_worked: (float) The total hours worked by the part-time employee.
    Create an instance attribute to store the hours worked.
    Display employee information for both full-time and part-time employees by printing each attribute directly.

In [4]:
# Expected Output:

# Name: Alice Smith
# Employee ID: FT123
# Base Salary: $80000.00
# Benefits: ['Health Insurance', 'Retirement Plan']
# Name: Bob Johnson
# Employee ID: PT456
# Base Salary: $40000.00
# Hours Worked: 20

In [3]:
class Employee:
    def __init__ (self, name, employee_id, base_salary):
        self.name = name
        self.employee_id = employee_id
        self.base_salary = base_salary
    
class FullTimeEmployee (Employee):
    def __init__ (self, name, employee_id, base_salary, benefits):
        super().__init__(name, employee_id, base_salary)
        self.benefits = benefits

class PartTimeEmployee (Employee):
    def __init__ (self, name, employee_id, base_salary, hours_worked):
        super().__init__(name, employee_id, base_salary)
        self.hours_worked = hours_worked

# Creating a full-time employee object
full_time_employee = FullTimeEmployee(
    name="Alice Smith",            
    employee_id="FT123",          
    base_salary=80000,            
    benefits=["Health Insurance", "Retirement Plan"]  
)

# Displaying full-time employee information
print(f"Name: {full_time_employee.name}")
print(f"Employee ID: {full_time_employee.employee_id}")
print(f"Base Salary: ${full_time_employee.base_salary:.2f}")
print(f"Benefits: {full_time_employee.benefits}")

# Creating a part-time employee object
part_time_employee = PartTimeEmployee(
    name="Bob Johnson",            
    employee_id="PT456",          
    base_salary=40000,          
    hours_worked=20              
)

# Displaying part-time employee information
print(f"Name: {part_time_employee.name}")
print(f"Employee ID: {part_time_employee.employee_id}")
print(f"Base Salary: ${part_time_employee.base_salary:.2f}")
print(f"Hours Worked: {part_time_employee.hours_worked}")

Name: Alice Smith
Employee ID: FT123
Base Salary: $80000.00
Benefits: ['Health Insurance', 'Retirement Plan']
Name: Bob Johnson
Employee ID: PT456
Base Salary: $40000.00
Hours Worked: 20


# Standardization Scaler

Description

You are a data scientist at PacData, responsible for preparing and scaling data for machine learning models. Your objective is to create a class that performs Standardization Scaling on a list of data points. This process ensures that the scaled data has a mean of zero and a standard deviation of one, which is useful for many machine learning algorithms.

Instructions:

- Define a class named StandardizationScaler that does not take any arguments during initialization.

- Create a method named fit:

    This method should take a list of data as input.
    It should calculate and store the mean and standard deviation of the data using the following formulas:

    $$
    \bar{X} = \frac{1}{n}\sum_{i=1}^n (X_i)
    $$

    $$
    \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2}
    $$

    Where $X_i$ is each value in the data, and $n$ is the number of data points.

- Create a method named transform:

    This method should take a list of data as input and return a new list where each value is scaled using the following formula:

    $$
    X_{scaled} = \frac{X - \bar{X}}\sigma
    $$

    Where $X$ is a value from the list, $\bar{X}$ is the mean, and $\sigma$ is the standard deviation. This method should only work after fit has been called. Ensure the the scaled result are rounded to two decimal places.

- Create a method named fit_transform:

    This method should take a list of data as input. It should combine the functionality of both fit and transform.

In [None]:
# Expected Output:

# Mean: 250.0
# Standard Deviation: 70.71067811865476
# Scaled Data: [-1.41, -0.71, 0.0, 0.71, 1.41]
# ============================================
# Mean: 250.0
# Standard Deviation: 70.71067811865476
# Scaled Data: [-1.41, -0.71, 0.0, 0.71, 1.41]

In [9]:
class StandardizationScaler:
    def fit (self, data):
        self.mean = sum(data) / (len(data))
        cumm = 0
        for i in data:
            cumm += (i - self.mean)**2
        self.std = (cumm / len(data))**(1/2)
    
    def transform (self, data):
        x_scaled = [round(((i - self.mean) / self.std), 2) for i in data]
        return x_scaled
    
    def fit_transform (self, data):
        self.fit(data)
        return self.transform(data)
    
# Creating a standardization scaler object
scaler = StandardizationScaler()

# Fitting the data
scaler.fit([150, 200, 250, 300, 350])
print("Mean:", scaler.mean)
print("Standard Deviation:", scaler.std)

# Transforming the data
scaled_data = scaler.transform([150, 200, 250, 300, 350])
print("Scaled Data:", scaled_data)

print("============================================")

# Alternatively, using fit_transform in one step
scaler.fit([150, 200, 250, 300, 350])  # Call fit again for clarity
scaled_data = scaler.transform([150, 200, 250, 300, 350])
print("Mean:", scaler.mean)
print("Standard Deviation:", scaler.std)
print("Scaled Data:", scaled_data)

Mean: 250.0
Standard Deviation: 70.71067811865476
Scaled Data: [-1.41, -0.71, 0.0, 0.71, 1.41]
Mean: 250.0
Standard Deviation: 70.71067811865476
Scaled Data: [-1.41, -0.71, 0.0, 0.71, 1.41]


# Ride Order Management

Description

You are a data analyst at PacRide, a ride-hailing service, and your task is to create a class named RideOrder to manage ride orders. Each ride order will involve creating a new ride, updating its status as it is picked up by the driver, marking the ride as completed, and processing payment upon completion.

Instructions:
- Define a class named RideOrder with the following requirements:

    It should have five inputs when initialized:
    order_id: (str) A unique identifier for the ride order.
    customer_name: (str) The name of the customer who requested the ride.
    pickup_location: (str) The starting location for the ride.
    destination: (str) The destination of the ride.
    fare: (float) The total fare for the ride (in dollars).
    The class should have the following attributes:
    order_id, customer_name, pickup_location, destination, fare, and
    status: (str) Stores the current status of the ride (default is "Pending").
- Create a method named pick_up_order:

    This method should update the ride’s status to "Picked Up".
    Return the message: "Order {order_id} picked up. Status updated to Picked Up.".
- Create a method named complete_order:

    This method should update the ride’s status to "Completed".
    Return the message: "Order {order_id} completed. Status updated to Completed.".
- Create a method named process_payment:

    This method should take one input:
    amount_paid: (float) The amount the customer is paying.
    If the amount_paid is less than the fare, return the message: "Payment failed: Insufficient amount.".
    If the amount_paid is equal to or greater than the fare, calculate the change and return the message: "Payment of ${amount_paid} processed for Order {order_id}. Change: \${change}.".

In [11]:
# Expected Output:

# Order PR123 created for Alice Johnson from Downtown to Airport. Status: Pending.
# Order PR123 picked up. Status updated to Picked Up.
# Order PR123 completed. Status updated to Completed.
# Payment failed: Insufficient amount.
# Payment of $30.0 processed for Order PR123. Change: $7.0.

In [10]:
class RideOrder:
    def __init__(self, order_id, customer_name, pickup_location, destination, fare):
        self.order_id = order_id
        self.customer_name = customer_name
        self.pickup_location = pickup_location
        self.destination = destination
        self.fare = fare
        self.status = 'Pending'

    def pick_up_order(self):
        self.status = 'Picked Up'
        return f'Order {self.order_id} picked up. Status updated to Picked Up.'
    
    def complete_order(self):
        self.status = 'Completed'
        return f'Order {self.order_id} completed. Status updated to Completed.'
    
    def process_payment(self, amount_paid):
        if amount_paid < self.fare:
            return f'Payment failed: Insufficient amount.'
        
        change = max(0, amount_paid - self.fare)
        return f'Payment of ${amount_paid} processed for Order {self.order_id}. Change: ${change}.'
    
# Create a RideOrder object
ride_order_1 = RideOrder(
    order_id="PR123",                          
    customer_name="Alice Johnson",             
    pickup_location="Downtown",                
    destination="Airport",                     
    fare=23.00)                                 

# Display initial status order
print(f"Order {ride_order_1.order_id} created for {ride_order_1.customer_name} from {ride_order_1.pickup_location} to {ride_order_1.destination}. Status: Pending.")

# Picking up the order
pickup_message = ride_order_1.pick_up_order()

# Completing the order
completion_message = ride_order_1.complete_order()

# Processing payment with an amount less than the fare
payment_failed_message = ride_order_1.process_payment(20.00)

# Processing payment with the correct amount
payment_success_message = ride_order_1.process_payment(30.00)

# Output the messages
print(pickup_message)
print(completion_message)
print(payment_failed_message)
print(payment_success_message)
            

Order PR123 created for Alice Johnson from Downtown to Airport. Status: Pending.
Order PR123 picked up. Status updated to Picked Up.
Order PR123 completed. Status updated to Completed.
Payment failed: Insufficient amount.
Payment of $30.0 processed for Order PR123. Change: $7.0.


# Word Frequency Counter

Description

You are a data analyst at PacNews, tasked with analyzing a body of text to identify the most frequently used words. This helps to understand keyword trends and content focus. You will create a class that counts how often each word appears in a text and returns the most common words. This analysis helps content analysts at PacNews identify key themes and trends in large documents or news articles, improving content categorization and focusing on important topics.

Instructions:
- Define a class named WordCounter that contains the following requirements:

    It should have 1 input when initialized:
    text: (str) A string of text from which words will be counted.
    Create an instance attribute named text to store the input values.
- Create a method named preprocess:

    This method should convert the text to lowercase and strip it of punctuation, returning a cleaned-up list of words.
- Create a method named count_words:

    This method should use the preprocess method to clean the text and then return a dictionary where the keys are words and the values are the counts of how often each word appears in the text.
    When printed, the dictionary should display word counts in a clear format.
- Create a method named get_top_n_words:

    This method should take an integer N as input and return a list of the top N most frequent words, sorted by frequency.
    If there is a tie in frequency, the words should be sorted lexicographically (i.e., based on the dictionary-like order of their characters). For example, "an" comes before "news" because 'a' has a lower ASCII value than 'n'.
    When printed, the method should display the top N words in a readable list format.

In [13]:
# Expected Output:

# Text Raw: PacNews is an online news portal. PacNews delivers news fast.
# Word Counts: {'pacnews': 2, 'is': 1, 'an': 1, 'online': 1, 'news': 2, 'portal': 1, 'delivers': 1, 'fast': 1}
# Top 3 Words: ['news', 'pacnews', 'an']

In [4]:
class WordCounter:
    def __init__(self, text):
        self.text = text

    def preprocess(self):
        text_lower = self.text.lower()
        text_replace = text_lower.replace('.', '')   
        cleaned_text = text_replace.split()
        return cleaned_text
    
    def count_words(self):
        dictionary = {}
        for word in self.preprocess():
            if word not in dictionary:
                dictionary[word] = 1
            else:
                dictionary[word] += 1
        return dictionary

    def get_top_n_words(self, N):
        top_n_list = []
        sorted_words = sorted(self.count_words().items(), key= lambda word: (-word[1], word[0]))
        for word in sorted_words:
            top_n_list.append(word[0])
        return top_n_list[:N]


text_sample = "PacNews is an online news portal. PacNews delivers news fast."
word_counter = WordCounter(text_sample)

# Display initial text
print(f"Text Raw: {word_counter.text}")

# Count words and print the result
word_counts = word_counter.count_words()
print("Word Counts:", word_counts)

# Get top N words and print the result
top_words = word_counter.get_top_n_words(3)
print("Top 3 Words:", top_words)

Text Raw: PacNews is an online news portal. PacNews delivers news fast.
Word Counts: {'pacnews': 2, 'is': 1, 'an': 1, 'online': 1, 'news': 2, 'portal': 1, 'delivers': 1, 'fast': 1}
Top 3 Words: ['news', 'pacnews', 'an']


# Log Processor

Problem Statement:

Create a class called LogProcessor that is used to process log files stored as a list. Each log contains information about the event timestamp and the event status.

Specifications:

- Attributes:

    logs: An empty list to store logs.
    
- Methods:

    __init__: Initializes the logs attribute as an empty list.

    tambah_log(timestamp, status): Adds a new log to the logs list in dictionary format:
    
    {"timestamp": timestamp, "status": status}
    
    hitung_status(status): Returns the number of logs with a specific status.
    
    filter_log(start, end): Returns all logs with a timestamp between start and end.

- Input Log Data: Add the following log data to the class using the tambah_log method:

    - {"timestamp": "2025-01-01 12:00:00", "status": "ERROR"}
    - {"timestamp": "2025-01-01 12:05:00", "status": "INFO"}
    - {"timestamp": "2025-01-01 12:10:00", "status": "ERROR"}
    - {"timestamp": "2025-01-01 12:15:00", "status": "INFO"}
- Output Instructions:

    Count how many logs have the status "ERROR".
    
    Filter the logs with timestamps between "2025-01-01 12:00:00" and "2025-01-01 12:10:00".

In [14]:
class LogProcessor:
    def __init__(self):
        self.logs = []

    def tambah_log(self, timestamp, status):
        self.logs.append({'timestamp': timestamp, 'status': status})

    def hitung_status(self, status):
        logg = 0
        for log in self.logs:
            if log['status'] == status:
                logg += 1
        return f'Timestamp dengan status ERROR ada {logg}'
    
    def filter_log(self, start, end):
        for index, log in enumerate(self.logs):
            timestamp = log['timestamp']
            if timestamp == start:
                start = index
            if timestamp == end:
                end = index
            
        return self.logs[start:end+1]

result = LogProcessor()
result.tambah_log("2025-01-01 12:00:00", "ERROR")
result.tambah_log("2025-01-01 12:05:00", "INFO")
result.tambah_log("2025-01-01 12:10:00", "ERROR")
result.tambah_log("2025-01-01 12:15:00", "INFO")

print(result.hitung_status('ERROR'))
print(result.filter_log("2025-01-01 12:00:00", "2025-01-01 12:10:00"))

Timestamp dengan status ERROR ada 2
[{'timestamp': '2025-01-01 12:00:00', 'status': 'ERROR'}, {'timestamp': '2025-01-01 12:05:00', 'status': 'INFO'}, {'timestamp': '2025-01-01 12:10:00', 'status': 'ERROR'}]
