### Objectives
- To create a mock pipeline based on the business model that students choose

### Introduction

Based on the previous readings, we will simulate how we can best determine
- The state of the current system (legacy)
- Whether we should migrate the system to cloud
- (Optional) Designing the architecture of the cloud system


### Problem Statement
Suppose you're a decision maker in the engineering team at an eCommerce company which currently generates millions of dollars in net revenue. You have a legacy system that sits on company's data centers in remote locations, away from the headquarter where your office is. The engineering team and the data analytics team have made the decision that they would like to build a search engine, recommendation engine, and few other engines that require heavy machine learning models to increase the revenue for the company. As a decision maker, you need to assess the current system, and the engineering requirement for the future fiscal years. So you decide to answer some questions and prepare answers for the next leadership meeting.


Currently, the capability of the data center is limited based on the square footage of the location, as well as the hardware that can be stored in the square footage. The data center resource is not only shared in the engineering team, but also within the supply chain team, logistics, HR's IT platform, and also with some other internal operations, like marketing, leadership, and also analytics team. The data centers have maxed out in terms of the hardware that they can support, and may be in need of expanding and opening up another data center.

#### Assumptions when solving this problem

- The problem statement merely states the current business problem you're facing. The design of the hypothetical data center, like square footage, hardware capabilities, human resources are all something you can create! The limitation is one thing to keep in mind, that in any data center(s), you have ran out of spaces to add more hardware.
- You can assume based on the revenue the company is making how much data is being generated per day.
- Be creative! If you're using third party data, i.e. Adobe Target for audience targeting, how are we receiving that data? Will that data sit on our data center and will it take up the resources? These are free constraints that you can create, but be realistic.

1. What is the current cost of maintaining the remote data center location?

In [1]:
# your solution

# The current cost of maintaining the remote data center location is the sum of the cost of the square footage and the cost of the hardware that can be stored in the square footage. 
#   - The cost of the square footage is the cost of the rent of the location, the cost of the electricity, the cost of the cooling system, the cost of the security system, and the cost of the insurance. 
#   - The cost of the hardware is the cost of the servers, the cost of the storage, the cost of the networking equipment, the cost of the backup system, the cost of the monitoring system, and the cost of the maintenance.

2. What are the new business requirements from your engineering team? 

In [2]:
# your solution, but be creative! It doesn't have to limit to the requirement from the problem statement

# The new business requirements from the engineering team are to build a search engine, recommendation engine, and few other engines that require heavy machine learning models to increase the revenue for the company. 

# - The search engine will allow the customers to search for products on the company's website. 
# - The recommendation engine will recommend products to the customers based on their browsing history and purchase history.
# - The other engines will provide insights to the marketing team, the leadership team, and the analytics team. 
#       - The marketing team will use the insights to target the right audience. 
#       - The leadership team will use the insights to make strategic decisions. 
#       - The analytics team will use the insights to improve the performance of the company.

3. Based on the new business requirements, you decide that you need to start migrating (at least) your resources to the cloud system. What are some things that you may need to look into when considering migrating your legacy system to cloud?

In [2]:
# your solution

# When considering migrating the legacy system to the cloud, you may need to look into the following things:

# 1. The cost

# 2. The security

# 3. The scalability

# 4. The performance

# 5. The reliability


4. In the end, you'll need to persuade leadership why you need to migrate a system to another place, i.e. cloud, when in the eyes of leadership, everything is working fine. Write out cons/pros based on the shared categories, like cost, human resources, etc.

In [4]:
# your solution

# Pros of Migrating to the Cloud:

"""1. Cost Savings

	•	Reduced Hardware Costs: No need for maintaining or purchasing physical servers, which reduces capital expenditure.
	•	Pay-as-you-go Model: Only pay for what you use, avoiding overprovisioning.
	•	Energy Efficiency: Lower energy consumption since you are not running physical infrastructure in-house.
	•	Maintenance Reduction: Cloud providers handle hardware upkeep, leading to fewer in-house IT maintenance costs.

2. Human Resources

	•	Less IT Overhead: Internal teams can focus on innovation rather than maintaining infrastructure.

3. Scalability

	•	Elastic Resources: Easily scale up or down based on demand, avoiding the risk of under or over-provisioning.
	•	Global Reach: If the business expands, cloud infrastructure can be accessed globally with minimal latency issues.

4. Disaster Recovery

	•	High Availability and Reliability: Built-in redundancy across data centers for backups and failovers, reducing the risk of downtime.
	•	Backup and Recovery: Easier and faster recovery options, minimizing potential data loss and recovery times.

5. Performance

	•	Improved Speed: Cloud services often outperform on-premise infrastructure in terms of speed and reliability.
	•	Access to AI Tools: Cloud environments provide access to AI, machine learning, and analytics tools that might be cost-prohibitive to implement on-premise.
 """

# Cons of Migrating to the Cloud:

""" 1. Migration

	•	Migration Costs: Initial expenses may be high due to data transfer, new architecture, and potential refactoring of applications.
	•	Training Costs: Teams may require additional training to operate in a cloud environment, which involves an upfront investment.

2. Human Resources

	•	Skill Gap: Not all IT staff may have cloud experience, requiring either hiring or retraining, which could slow down the migration.
	•	Resistance to Change: Employees might resist the move if they are accustomed to on-premise infrastructure.

3. Ongoing Operational Costs

	•	Uncontrolled Spending: If not managed well, cloud usage can result in unpredictable and higher-than-expected costs due to scaling and additional services.

4. Security Concerns

	•	Data Privacy and Compliance: Depending on your cloud provider, data privacy and compliance could be more challenging, especially in regulated industries (e.g., healthcare).
	•	Third-party Risks: Entrusting data to third-party vendors introduces new security risks.

5. Performance Uncertainty

	•	Latency Issues: Some cloud applications may suffer from latency issues depending on geographic location or bandwidth.
	•	Dependence on Internet Connectivity: A solid internet connection is a must; downtime or slow speeds could impact productivity. """

(OPTIONAL) This may require an engineering/software background. You have the same hypothetical machine learning architecture, but one on the legacy system and one on the cloud. Moving away from the business logic, what is the benefit on the engineering stack if we use the cloud? What can be problematic if we use the cloud?

In [5]:
# your solution, attach a screenshot of a hypothetical architecture at a high level

# The benefit of using the cloud for the machine learning architecture is that it provides scalability, flexibility, and cost-effectiveness. The cloud allows you to scale up or down based on the demand, which can help you save costs. The cloud also provides flexibility in terms of the resources you can use, such as storage, compute, and networking. Additionally, the cloud offers cost-effective solutions, as you only pay for what you use.

# Hypothetical architecture at a high level (with words of the services used in the architecture):

# Cloud Storage: Data is stored in a cloud storage service such as Amazon S3.
# Cloud Computing: Data processing is done using cloud computing resources such as Amazon EC2.
# Machine Learning Models: Machine learning models are deployed on cloud servers using services like Amazon SageMaker.
# APIs: APIs interact with the machine learning models to provide predictions and insights to end-users.
# Monitoring: Monitoring services such as Amazon CloudWatch is used to monitor the performance and health of the architecture.
# Security: Security services such as AWS Identity and Access Management (IAM) is used to secure the architecture.
# Networking: Networking services such as Amazon VPC is used to connect the different components of the architecture.
# Cost Management: Cost management services such as AWS Cost Explorer is used to monitor and optimize the costs of the architecture.

# Some potential problems that can arise from the architecture include:

# 1. Data security: Storing sensitive data in the cloud can pose security risks, such as data breaches or unauthorized access.

# 2. Scalability challenges: While the architecture is designed to be scalable, there may be challenges in scaling up or down based on demand.

# 3. Performance bottlenecks: The architecture may face performance bottlenecks due to network latency, data processing delays, or resource contention.

# 4. Cost management: While the cloud offers cost-effective solutions, improper resource allocation or inefficient use of cloud services can lead to cost overruns.

# 5. Compliance and regulatory issues: The architecture needs to comply with data protection regulations and industry standards to ensure legal and ethical use of data.

