diff --git a/docs/ai-ml/machine-learning/index.mdx b/docs/ai-ml/machine-learning/index.mdx deleted file mode 100644 index e345ed2..0000000 --- a/docs/ai-ml/machine-learning/index.mdx +++ /dev/null @@ -1 +0,0 @@ - \ No newline at end of file diff --git a/docs/machine-learning/fundamentals/data-splitting.mdx b/docs/machine-learning/fundamentals/data-splitting.mdx new file mode 100644 index 0000000..e69de29 diff --git a/docs/machine-learning/fundamentals/ml-workflow.mdx b/docs/machine-learning/fundamentals/ml-workflow.mdx new file mode 100644 index 0000000..e69de29 diff --git a/docs/machine-learning/fundamentals/types-of-learning.mdx b/docs/machine-learning/fundamentals/types-of-learning.mdx new file mode 100644 index 0000000..e69de29 diff --git a/docs/machine-learning/fundamentals/what-is-ml.mdx b/docs/machine-learning/fundamentals/what-is-ml.mdx new file mode 100644 index 0000000..fb7f92b --- /dev/null +++ b/docs/machine-learning/fundamentals/what-is-ml.mdx @@ -0,0 +1,92 @@ +--- +title: "What is Machine Learning (ML)?" +sidebar_label: "What is ML?" +description: "Define Machine Learning, its key characteristics, and how it differs from traditional programming." +tags: + [ + machine-learning, + ml, + definition, + ai, + traditional-programming, + data-driven, + algorithms, + ] +--- + +Machine Learning is a subset of Artificial Intelligence (AI) that focuses on building systems capable of learning patterns and making decisions or predictions directly from data, rather than following static, explicitly programmed instructions. + +## The Formal Definition + +A widely accepted, formal definition of Machine Learning was provided by computer scientist **Tom M. Mitchell** in 1997: + +> A computer program is said to learn from **Experience ($E$)** with respect to some **Task ($T$)** and some **Performance measure ($P$)**, if its performance on $T$, as measured by $P$, improves with experience $E$. + +Let's break down this concept with a simple example: **Spam Filtering**. + +| Component | Description | Spam Filtering Example | +| :--- | :--- | :--- | +| **Task ($T$)** | The problem the ML system is trying to solve. | Classifying an email as "Spam" or "Not Spam (Ham)". | +| **Experience ($E$)** | The data the ML system uses to train itself. | A large dataset of historical emails labeled as either spam or ham. | +| **Performance ($P$)** | A metric used to evaluate the system's success. | **Accuracy:** The percentage of emails correctly classified. | + +:::tip +The core idea is that the program's ability to classify new, unseen emails gets better the more labeled examples it processes. The program *learns* the rules itself. +::: + +## ML vs. Traditional Programming + +This is the most crucial concept when starting out. Machine Learning fundamentally shifts the paradigm of software development. + + + + + + + In traditional programming, you (the programmer) write explicit **Rules** (algorithms, logic, conditions) that process **Data** to produce an **Answer**. + + ```mermaid + graph LR + A[Data] --> B(Rules/Program); + B --> C[Answer]; + ``` + +**Example (Temperature Conversion):** +You explicitly write the formula: `Fahrenheit = (Celsius * 9/5) + 32`. The computer executes this static rule. + + + + +In Machine Learning, you feed the system the **Data** and the desired **Answers** (Labels), and the system autonomously generates the **Rules** (the Model/Algorithm) that maps the input to the output. + +```mermaid +graph LR + A[Data] --> B(ML Algorithm); + C[Answers/Labels] --> B; + B --> D[Rules/Model]; +``` + +**Example (Predicting House Price):** +You feed it past house data (size, location) and the final sale price. The ML algorithm creates a complex mathematical model (the "Rule") that predicts the price of a *new* house based on its features. + + + + +## Key Characteristics of Machine Learning + + * **Data-Driven:** ML models require vast amounts of high-quality data to learn effectively. + * **Automatic Pattern Discovery:** The system discovers hidden patterns, correlations, and rules in the data without human intervention. + * **Generalization:** A good ML model can accurately predict or classify data it has never seen before (its performance improves with experience $E$). + * **Iterative Process:** Developing an ML model is a cyclical process of data collection, training, evaluation, and refinement. + +## Where is ML Used? + +Machine Learning is the engine behind many everyday technologies: + +| Domain | Application | ML Task | +| :--- | :--- | :--- | +| **E-commerce** | Recommendation Systems (e.g., "People who bought X also bought Y") | Classification / Ranking | +| **Healthcare** | Tumor detection in X-rays or MRIs | Image Segmentation / Classification | +| **Finance** | Fraud detection in credit card transactions | Anomaly Detection / Classification | +| **Speech** | Voice assistants (Siri, Alexa) | Natural Language Processing (NLP) | +| **Transportation**| Self-driving cars | Computer Vision / Reinforcement Learning | \ No newline at end of file diff --git a/docs/machine-learning/introduction.mdx b/docs/machine-learning/introduction.mdx new file mode 100644 index 0000000..fc6b94e --- /dev/null +++ b/docs/machine-learning/introduction.mdx @@ -0,0 +1,154 @@ +--- +title: Introduction to Machine Learning +sidebar_label: Introduction +description: "A comprehensive introduction to the Machine Learning Tutorial structure, purpose, and key learning outcomes for CodeHarborHub learners." +tags: + [ + machine-learning, + ml, + introduction, + ai, + data-science, + tutorial, + codeharborhub, + roadmap, + ml-engineer, + ] +--- + +Welcome to the **CodeHarborHub Machine Learning Tutorial**! This is your official gateway into the transformative world of Artificial Intelligence, data analysis, and predictive modeling. + +:::info +Machine Learning is not just about complex algorithms; it is about building systems that learn from data to make decisions or predictions *without* being explicitly programmed for every outcome. +::: + +## Why Machine Learning Now? + +The demand for ML skills is soaring across every industry—from finance and healthcare to entertainment and autonomous technology. By learning ML, you are gaining one of the most valuable and future-proof skill sets in the 21st century. + +### What You Will Learn + +This tutorial provides a complete, structured roadmap to transform you into a proficient ML practitioner. By the end, you will master: + +1. **Foundations:** The mathematical and statistical bedrock of ML. +2. **Core Algorithms:** Implementing models like Linear Regression, Support Vector Machines, and K-Means. +3. **Deep Learning:** Building advanced Neural Networks (CNNs, RNNs, Transformers). +4. **Practical Workflow:** Handling real-world data, evaluating models, and deploying solutions (MLOps). +5. **Coding:** Writing efficient, production-ready Python code using libraries like NumPy, Pandas, and Scikit-learn. + +## Tutorial Structure Overview + +This curriculum is designed as a deep, sequential progression. We move from the absolute basics (Math and Programming) to advanced deployment strategies. + + + + ### The Bedrock of ML + This initial stage ensures you have the solid academic footing required for understanding the algorithms. + + * **Mathematics:** Linear Algebra (Vectors, Matrices, Tensors) and Calculus (Derivatives, Gradients). For instance, the **Gradient Descent** optimization algorithm relies heavily on the partial derivative concept: + $$ + \theta_{j} := \theta_{j} - \alpha \frac{\partial}{\partial \theta_{j}} J(\theta) + $$ + * **Statistics & Probability:** Concepts like probability distributions, conditional probability, and data visualization. + * **Programming Fundamentals:** Mastering Python, NumPy, and Pandas. + + + ### Algorithms and Architectures + Here, you start building models and diving into neural networks. + + * **ML Core:** Supervised, Unsupervised, and Reinforcement Learning paradigms. + * **Data Engineering:** Preprocessing data, handling missing values, and the critical step of **Feature Engineering**. + * **Deep Learning:** Understanding Perceptrons, Backpropagation, and specialized networks (CNNs for images, RNNs/Transformers for text). + + + ### Real-World Application + The final stage focuses on specialized fields and moving models into production. + + * **NLP:** Tokenization, Embeddings, and Attention Mechanisms for text processing. + * **Explainable AI (XAI):** Tools like LIME and SHAP to interpret complex model decisions. + * **MLOps:** The engineering discipline of deploying, monitoring, and maintaining ML models in a reliable and reproducible way (CI/CD, Model Versioning). + + + +--- + +## The Machine Learning Engineer Role + +Understanding the role helps you align your learning goals. + +| Aspect | ML Engineer | AI Engineer | +| :--- | :--- | :--- | +| **Primary Focus** | Production-level implementation, deployment, MLOps, scalability, data pipelines. | Research, development of novel AI models (especially Deep Learning/Generative AI), fine-tuning large models. | +| **Core Skills** | Python, Cloud (AWS/Azure/GCP), Docker, CI/CD, Scikit-learn, TensorFlow/PyTorch, **Data Engineering**. | Strong math/research background, Deep Learning frameworks, model optimization, **State-of-the-Art** techniques. | +| **Goal** | Make models reliably work in production at scale. | Create new intelligence capabilities or highly specialized models. | + +:::success +This tutorial provides a strong foundation for **both** roles, with a dedicated focus on the practical implementation skills needed for the **ML Engineer** track. +::: + +## Types of Machine Learning + +```mermaid +mindmap + root((Machine Learning)) + Supervised Learning + Regression + Classification + Unsupervised Learning + Clustering + Dimensionality Reduction + Reinforcement Learning + Reward Systems + Agents & Environment +``` + + + + Learn from labeled data (input → correct output). + Examples: + * House price prediction + * Spam detection + * Disease prediction . + + + + Find hidden patterns in data without labels. + Examples: + * Customer segmentation + * Anomaly detection + * Data clustering + + + + Learn through rewards and penalties. + Examples: + * Robotics + * Game AI + * Autonomous vehicles + + + +## Tools You Will Use + + + + Python is the primary language for ML due to its simplicity and rich ecosystem. + + + + - NumPy + - Pandas + - Matplotlib / Seaborn + - Scikit-Learn + - TensorFlow + - PyTorch + + + + Jupyter Notebooks help you write code, visualize results, and document your workflow. + + + +## Ready to Begin? + +Start by learning the fundamental definition of Machine Learning and the core concepts that define this field. \ No newline at end of file diff --git a/docs/machine-learning/ml-engineer-vs-ai-engineer.mdx b/docs/machine-learning/ml-engineer-vs-ai-engineer.mdx new file mode 100644 index 0000000..297c413 --- /dev/null +++ b/docs/machine-learning/ml-engineer-vs-ai-engineer.mdx @@ -0,0 +1,70 @@ +--- +title: "ML Engineer vs. AI Engineer" +sidebar_label: "MLE vs. AIE" +description: "A clear comparison of the Machine Learning Engineer, AI Engineer, and Data Scientist roles, focusing on responsibilities, tools, and project scope." +tags: + [ + ml-engineer, + ai-engineer, + data-scientist, + comparison, + roles, + career-path, + ai, + ml, + ] +--- + +The titles in the Artificial Intelligence (AI) domain often overlap, leading to confusion. While job descriptions vary widely by company, we can define the typical focus area for the three core roles: **Data Scientist (DS)**, **Machine Learning Engineer (MLE)**, and **AI Engineer (AIE)**. + + +## 1. Data Scientist (DS): The Statistician & Modeler + +The DS role is primarily focused on **discovery and experimentation**. + +* **Goal:** To answer business questions using data, uncover patterns, and build predictive models in an experimental environment (e.g., Jupyter Notebooks). +* **Focus:** **Why** and **What** is the data telling us? They are the domain experts in statistical modeling and analysis. +* **Key Responsibilities:** + * Statistical analysis and hypothesis testing. + * Developing novel modeling approaches. + * Data visualization and storytelling with data. + * Communicating insights to stakeholders. +* **Tools:** Python, R, Pandas, Scikit-learn, statistical packages. + +## 2. Machine Learning Engineer (MLE): The Production Expert + +The MLE role is the bridge between the experimental DS model and the production system. + +* **Goal:** To turn high-performing models into reliable, scalable services used by millions of users. +* **Focus:** **How** do we integrate this model into the product pipeline? They are system-level engineers specializing in ML. +* **Key Responsibilities:** + * Designing and implementing robust data pipelines. + * Deploying models using MLOps tools (Docker, Kubernetes). + * Monitoring model performance (drift detection, latency). + * Optimizing model code for speed and efficiency. +* **Tools:** Python, Cloud Platforms (AWS, Azure, GCP), Docker, Kubernetes, CI/CD, MLflow/DVC. + +## 3. AI Engineer (AIE): The Advanced Modeler & Specialist + +The AIE role is often used interchangeably with MLE, but when distinct, it typically focuses on **cutting-edge AI domains**. + +* **Goal:** To work with and advance complex, high-impact AI systems, particularly in Deep Learning, NLP, and Computer Vision. +* **Focus:** **What** state-of-the-art model should we use? They specialize in specific deep learning architectures. +* **Key Responsibilities:** + * Implementing and fine-tuning large, complex models (e.g., Transformers, LLMs, Generative Models). + * Optimizing GPU/TPU utilization for training large neural networks. + * Researching and adopting new AI architectures. +* **Tools:** PyTorch, TensorFlow, Hugging Face, distributed training frameworks. + +## Comparison Table + +| Feature | Data Scientist (DS) | ML Engineer (MLE) | AI Engineer (AIE) | +| :--- | :--- | :--- | :--- | +| **Primary Output** | Insights, Reports, Experimental Models | Production-Ready ML Services/APIs | Specialized Deep Learning Systems | +| **Core Skill** | Statistics, Modeling, Domain Knowledge | Software Engineering, MLOps, System Design | Deep Learning, Advanced AI Architectures | +| **Project Stage** | Exploration & Proof-of-Concept | Deployment & Maintenance | Research & Implementation of Advanced Models | +| **Typical Stack** | Python/R, Jupyter, Scikit-learn | Python, Docker, Kubernetes, Cloud SDKs | Python, PyTorch/TensorFlow, GPUs/TPUs | + +:::important +**CodeHarborHub's Focus:** This tutorial is geared towards the **Machine Learning Engineer** skillset. We will give you the *modeling foundation* of a Data Scientist and the *engineering discipline* of a Software Engineer, emphasizing the MLOps skills needed for real-world production. +::: \ No newline at end of file diff --git a/docs/machine-learning/ml-lifecycle.mdx b/docs/machine-learning/ml-lifecycle.mdx new file mode 100644 index 0000000..5840e7a --- /dev/null +++ b/docs/machine-learning/ml-lifecycle.mdx @@ -0,0 +1,86 @@ +--- +title: "The Machine Learning Lifecycle (MLLC)" +sidebar_label: "ML Lifecycle" +description: "A step-by-step guide to the Machine Learning Lifecycle, from problem definition and data collection to model deployment and monitoring." +tags: + [ + ml-lifecycle, + mllc, + mlops, + data-science-process, + deployment, + monitoring, + data-preparation, + ] +--- + +The Machine Learning Lifecycle (MLLC) is a structured, iterative process that guides an ML team from the initial business problem definition to the final deployment and ongoing maintenance of the predictive model. Unlike traditional software development, the MLLC is heavily reliant on data and model performance. + + +## 1. Business Understanding and Problem Framing + +This initial stage determines the project's **feasibility and direction**. Without a clear goal, the entire project is set up for failure. + +* **Define the Goal:** What business metric are we trying to improve (e.g., increase customer click-through rate, reduce equipment failure)? +* **Define the ML Task:** Translate the business goal into a specific ML task (e.g., Is it **Classification** to predict a yes/no outcome? Is it **Regression** to predict a continuous value?). +* **Define Success:** What is the minimum acceptable performance metric ($P$) for the model to be considered useful? (e.g., 90% accuracy, AUC of 0.85). + +## 2. Data Acquisition and Preparation + +The most time-consuming stage, where data is gathered, cleaned, and prepared for modeling. + +* **Data Acquisition:** Identifying sources (databases, APIs, logs) and extracting the raw data. +* **Data Cleaning:** Handling missing values, correcting errors, and dealing with outliers. +* **Feature Engineering:** Creating new, informative variables from the raw data. This step is critical for model performance. +* **Data Splitting:** Dividing the data into **Training, Validation, and Test** sets to ensure robust model evaluation. + +## 3. Model Development and Training + +This is where the algorithms come into play, and the model learns from the prepared data. + +* **Algorithm Selection:** Choosing an appropriate model based on the ML task (e.g., Linear Regression for simple predictions, Neural Networks for complex image data). +* **Training:** Feeding the training data to the algorithm and optimizing the model's parameters (e.g., weights and biases) to minimize the **Loss Function**. +* **Hyperparameter Tuning:** Fine-tuning parameters *outside* of the learning process (e.g., learning rate, number of layers) using the **Validation** set. + +## 4. Model Evaluation + +Assessing how well the trained model performs and whether it meets the success criteria defined in Step 1. + +* **Metric Calculation:** Calculating the defined performance metrics ($P$) using the unseen **Test** set (e.g., Precision, Recall, F1-Score, RMSE). +* **Bias and Fairness:** Checking for unintended biases in predictions across different groups. +* **Validation:** Ensuring the model generalizes well and is not **overfitting** (performing great on training data, poorly on new data) or **underfitting** (performing poorly overall). + +## 5. Deployment + +The process of integrating the model into a live application or business process, making its predictions accessible in real-time. + +* **Packaging:** Using **Docker** to containerize the model along with its required dependencies. +* **Serving:** Deploying the model as an API endpoint (e.g., using Flask/FastAPI) via services like Kubernetes or cloud-native ML platforms. +* **Testing:** Conducting live tests (e.g., **A/B Testing**) to compare the new model's performance against the old system or baseline. + +## 6. Monitoring and Maintenance + +The cycle does not end at deployment. Models degrade over time due to changes in real-world data. + +* **Performance Monitoring:** Continuously tracking the model's actual performance metrics against the baseline. +* **Data Drift Detection:** Alerting the team when the characteristics of the *input data* change significantly from the training data, leading to performance decay. +* **Retraining:** Establishing automated pipelines to retrain and update the model periodically or when performance drops below a critical threshold. + +--- + + + + :::important + **MLOps** is the practice that makes the MLLC possible in a production environment. It's a set of processes and tools (CI/CD, Monitoring) that ensure the transition between all these stages is seamless, automated, and reliable. + ::: + + + :::tip + The MLLC is a **loop**. If the model fails evaluation (Step 4) or degrades in monitoring (Step 6), the team must iterate back to the Data Preparation (Step 2) or Modeling (Step 3) stages. + ::: + + + +--- + +This concludes the **Introduction** section of the Machine Learning Tutorial! You now have a solid understanding of what ML is, who builds it, and the process they follow. \ No newline at end of file diff --git a/docs/machine-learning/role-of-ml-engineer.mdx b/docs/machine-learning/role-of-ml-engineer.mdx new file mode 100644 index 0000000..1bb25d7 --- /dev/null +++ b/docs/machine-learning/role-of-ml-engineer.mdx @@ -0,0 +1,85 @@ +--- +title: "Role of a Machine Learning Engineer" +sidebar_label: "ML Engineer Role" +description: "Understand the core responsibilities, required skill set, and day-to-day tasks of a Machine Learning Engineer in a professional setting." +tags: + [ + ml-engineer, + role, + responsibilities, + skills, + mlops, + data-scientist, + ai-engineer, + ] +--- + +The Machine Learning Engineer (MLE) sits at the critical intersection of **Data Science** and **Software Engineering**. Their primary responsibility is to bridge the gap between experimental models created by data scientists and reliable, scalable systems that operate in production. + +## Core Responsibilities + +An ML Engineer's job revolves around the end-to-end lifecycle of an ML project. + +### 1. Productionizing Models (MLOps) +This is arguably the most distinguishing task. An MLE takes a working model (e.g., a Python notebook) and turns it into a service that can handle thousands of requests per second with high reliability and low latency. + +* **Deployment:** Using tools like Docker, Kubernetes, and cloud services (AWS SageMaker, Azure ML, Google AI Platform) to serve the model via an API. +* **Scalability:** Ensuring the model can handle a growing volume of data and users. + +### 2. Data Engineering & Preprocessing +High-quality, correctly structured data is essential. MLEs often design and maintain the pipelines that feed data to the model. + +* **ETL/ELT:** Designing pipelines to Extract, Transform, and Load data efficiently. +* **Feature Engineering:** Creating meaningful input features from raw data that help the model learn better. + +### 3. Model Training and Optimization +While Data Scientists may focus on model research, MLEs focus on making that model efficient. + +* **Hyperparameter Tuning:** Optimizing parameters (e.g., learning rate) to improve model performance. +* **Code Optimization:** Rewriting and optimizing training code for speed, often leveraging GPUs or distributed computing. + +### 4. Monitoring and Maintenance +Once deployed, the model must be continuously monitored for performance degradation. + +* **Drift Detection:** Identifying when **data drift** (input data changes) or **model drift** (model performance degrades over time) occurs. +* **Retraining:** Automating the process of retraining and updating the model to maintain accuracy. + +## Essential Skill Set + +The MLE role requires a strong blend of theoretical knowledge and practical engineering skills. + + + + + * **Programming:** Mastery of Python (and often C++ or Java for performance). + * **MLOps Tools:** Docker, Kubernetes, CI/CD tools (GitLab, GitHub Actions). + * **System Design:** Understanding microservices, REST APIs, and system architecture for serving models. + * **Databases:** Strong SQL and NoSQL skills. + + + + + * **Algorithms:** Deep understanding of common ML and Deep Learning algorithms. + * **Frameworks:** Expertise in PyTorch, TensorFlow, and Scikit-learn. + * **Statistics:** Understanding model evaluation metrics (e.g., precision, recall, AUC). + * **Experiment Tracking:** Using tools like MLflow or Weights & Biases. + + + + + * **Problem-Solving:** Deconstructing complex, ambiguous problems into solvable ML tasks. + * **Communication:** Clearly explaining complex technical results to both engineers and business stakeholders. + * **Collaboration:** Working closely with Data Scientists, Data Engineers, and DevOps teams. + + + + +## Example: A Day in the Life + +:::note +An MLE's day often shifts between writing robust code and solving model-specific issues. + +1. **Morning:** Reviewing model performance dashboards. Debugging a spike in latency for the recommendation system. +2. **Mid-day:** Collaborating with the Data Science team on a new feature set; implementing the data preprocessing logic to ensure **reproducibility** between training and serving environments. +3. **Afternoon:** Writing a Dockerfile and a Kubernetes deployment script to A/B test a newly trained model against the production baseline. +::: diff --git a/docs/machine-learning/skills-and-responsibilities.mdx b/docs/machine-learning/skills-and-responsibilities.mdx new file mode 100644 index 0000000..19e76b8 --- /dev/null +++ b/docs/machine-learning/skills-and-responsibilities.mdx @@ -0,0 +1,103 @@ +--- +title: "Skills and Responsibilities for ML Engineers" +sidebar_label: "Essential ML Skills" +description: "A detailed breakdown of the technical, mathematical, and soft skills required to succeed as a Machine Learning Engineer." +tags: + [ + ml-skills, + responsibilities, + career-roadmap, + programming, + math, + mlops, + data-skills, + ] +--- + +Success as a Machine Learning Engineer requires a "triple threat" combination: strong **mathematical/statistical foundations**, robust **programming/engineering skills**, and practical **ML application knowledge**. + +## 1. Technical Skills (The "How-to") + +These are the tools and languages you will use daily to build and deploy systems. + +### A. Programming Mastery: Python +Python is the undisputed leader in ML. You must go beyond basic syntax and understand: + +* **Libraries:** Expert use of **NumPy** (for numerical operations), **Pandas** (for data manipulation), and **Scikit-learn** (for classical ML algorithms). +* **Performance:** Writing vectorized code, understanding time and space complexity, and optimizing functions. +* **Software Engineering:** Knowledge of Object-Oriented Programming (OOP), version control (Git), and writing clean, testable code. + +### B. Machine Learning Frameworks +You need proficiency in at least one major Deep Learning framework: + + + + :::tip + Known for its dynamic computation graph, making it popular for research and flexibility. + ::: + ```py + # Example: Defining a simple PyTorch model + import torch.nn as nn + + class SimpleNet(nn.Module): + def __init__(self): + super(SimpleNet, self).__init__() + self.linear = nn.Linear(784, 10) + + def forward(self, x): + return self.linear(x) + ``` + + + :::tip + Known for its production readiness and scalable deployment tools (TFLite, TFServing). + ::: + ```py + # Example: Defining a simple Keras model + from tensorflow import keras + from tensorflow.keras import layers + + model = keras.Sequential([ + layers.Dense(64, activation='relu', input_shape=(784,)), + layers.Dense(10, activation='softmax') + ]) + ``` + + + +### C. MLOps and Deployment +This separates a good **Data Scientist** from a functioning **ML Engineer**. + +* **Containerization:** Using **Docker** to package models and dependencies. +* **Orchestration:** Basic understanding of **Kubernetes** for managing containerized applications at scale. +* **Cloud Platforms:** Experience with ML services on AWS (SageMaker), Google Cloud (Vertex AI), or Azure (Azure ML). + +## 2. Foundational Skills (The "Why") + +These skills provide the intuition necessary to design, debug, and select the right algorithms. + +### A. Mathematics +* **Linear Algebra:** Understanding vectors, matrices, and matrix operations is crucial for understanding how data is represented and processed in neural networks. +* **Calculus:** Essential for **optimization**. Concepts like derivatives and gradients are the basis of **Gradient Descent**, the engine that trains nearly all ML models. + +### B. Statistics and Probability +* **Statistical Modeling:** Understanding hypothesis testing, sampling, and probability distributions. +* **Model Evaluation:** Knowing when to use $R^2$ vs. F1-Score vs. AUC, and how to interpret confidence intervals. + +## 3. Data-Centric Responsibilities + +MLEs spend a significant portion of their time working with data. + +* **Data Cleaning & Preprocessing:** Handling missing values, transforming categorical variables, and dealing with outliers. +* **Feature Engineering:** The creative process of transforming raw data into features that best represent the underlying problem. **This often has a bigger impact than changing the algorithm.** +* **Pipeline Building:** Creating repeatable, efficient, and monitored data workflows using tools like Apache Airflow or cloud-native solutions. + +## 4. Soft Skills + +:::caution +Do not underestimate soft skills! An ML project involves many different teams. +::: + +* **Communication:** Translating complex technical results into clear, actionable business recommendations. +* **Curiosity and Learning:** The ML field evolves rapidly. You must commit to continuous learning of new papers, frameworks, and techniques. +* **A/B Testing and Experimentation:** Designing experiments to rigorously test the real-world impact of your deployed models. diff --git a/sidebars.ts b/sidebars.ts index df3024c..1d8d9fe 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -260,6 +260,14 @@ const sidebars: SidebarsConfig = { href: "/css/introduction", }, + // Machine LearningTutorial Structure + + { + type: "link", + label: "Machine Learning", + href: "/machine-learning/introduction", + }, + // JavaScript Tutorial Structure { @@ -1643,6 +1651,257 @@ const sidebars: SidebarsConfig = { // items: [], // }, ], + + ml: [ + "machine-learning/introduction", + "machine-learning/role-of-ml-engineer", + "machine-learning/ml-engineer-vs-ai-engineer", + "machine-learning/skills-and-responsibilities", + "machine-learning/ml-lifecycle", + + { + type: "category", + label: "ML Fundamentals", + link: { + type: "generated-index", + title: "Machine Learning Fundamentals", + description: + "Understand the core concepts of Machine Learning including types of learning, real-world applications, and essential terminology.", + keywords: [ + "machine learning basics", + "ai fundamentals", + "learning types", + "ml introduction", + ], + }, + items: [ + "machine-learning/fundamentals/what-is-ml", + "machine-learning/fundamentals/types-of-learning", + "machine-learning/fundamentals/ml-workflow", + "machine-learning/fundamentals/data-splitting", + ], + }, + + // { + // type: "category", + // label: "Math for ML", + // link: { + // type: "generated-index", + // title: "Essential Math for Machine Learning", + // description: + // "Learn the math behind ML — linear algebra, calculus, statistics, and probability with simplified explanations.", + // keywords: [ + // "math for ml", + // "statistics", + // "probability", + // "linear algebra", + // "calculus in ml", + // ], + // }, + // items: [ + // "machine-learning/math/linear-algebra", + // "machine-learning/math/calculus", + // "machine-learning/math/statistics", + // "machine-learning/math/probability", + // ], + // }, + + // { + // type: "category", + // label: "Data Preprocessing", + // link: { + // type: "generated-index", + // title: "Data Cleaning & Preprocessing", + // description: + // "Master the data preparation pipeline — handling missing values, scaling, encoding, feature extraction, and more.", + // keywords: [ + // "ml data preprocessing", + // "data cleaning", + // "feature engineering", + // "ml pipeline", + // ], + // }, + // items: [ + // "machine-learning/data-preprocessing/handling-missing-data", + // "machine-learning/data-preprocessing/feature-scaling", + // "machine-learning/data-preprocessing/encoding", + // "machine-learning/data-preprocessing/feature-engineering", + // ], + // }, + + // { + // type: "category", + // label: "Supervised Learning", + // link: { + // type: "generated-index", + // title: "Supervised Machine Learning", + // description: + // "Learn the full set of supervised ML algorithms — regression, classification, trees, SVMs, and ensembles.", + // keywords: [ + // "supervised learning", + // "regression", + // "classification", + // "svm", + // "decision trees", + // "ensemble models", + // ], + // }, + // items: [ + // { + // type: "category", + // label: "Regression", + // items: [ + // "machine-learning/supervised/regression/linear-regression", + // "machine-learning/supervised/regression/polynomial-regression", + // "machine-learning/supervised/regression/ridge-lasso", + // ], + // }, + // { + // type: "category", + // label: "Classification", + // items: [ + // "machine-learning/supervised/classification/logistic-regression", + // "machine-learning/supervised/classification/knn", + // "machine-learning/supervised/classification/svm", + // "machine-learning/supervised/classification/naive-bayes", + // ], + // }, + // { + // type: "category", + // label: "Tree Models", + // items: [ + // "machine-learning/supervised/tree-models/decision-tree", + // "machine-learning/supervised/tree-models/random-forest", + // "machine-learning/supervised/tree-models/gradient-boosting", + // "machine-learning/supervised/tree-models/xgboost", + // ], + // }, + // ], + // }, + + // { + // type: "category", + // label: "Unsupervised Learning", + // link: { + // type: "generated-index", + // title: "Unsupervised Machine Learning", + // description: + // "Explore clustering, dimensionality reduction, and anomaly detection techniques used to uncover hidden patterns.", + // keywords: [ + // "unsupervised learning", + // "clustering", + // "k-means", + // "pca", + // "anomaly detection", + // ], + // }, + // items: [ + // "machine-learning/unsupervised/kmeans", + // "machine-learning/unsupervised/hierarchical-clustering", + // "machine-learning/unsupervised/dbscan", + // "machine-learning/unsupervised/pca", + // "machine-learning/unsupervised/anomaly-detection", + // ], + // }, + + // { + // type: "category", + // label: "Model Evaluation", + // link: { + // type: "generated-index", + // title: "Model Evaluation & Validation", + // description: + // "Learn how to evaluate ML models using metrics, cross-validation, ROC curves, confusion matrices, and more.", + // keywords: [ + // "model evaluation", + // "ml metrics", + // "cross validation", + // "confusion matrix", + // "roc auc", + // ], + // }, + // items: [ + // "machine-learning/evaluation/metrics-regression", + // "machine-learning/evaluation/metrics-classification", + // "machine-learning/evaluation/cross-validation", + // "machine-learning/evaluation/overfitting-underfitting", + // ], + // }, + + // { + // type: "category", + // label: "Neural Networks", + // link: { + // type: "generated-index", + // title: "Neural Networks & Deep Learning", + // description: + // "Understand the foundations of neural networks, activation functions, backpropagation, optimization, and training techniques.", + // keywords: [ + // "neural networks", + // "deep learning", + // "activation functions", + // "backpropagation", + // "optimizers", + // ], + // }, + // items: [ + // "machine-learning/neural-networks/perceptron", + // "machine-learning/neural-networks/activation-functions", + // "machine-learning/neural-networks/loss-functions", + // "machine-learning/neural-networks/backpropagation", + // "machine-learning/neural-networks/optimizers", + // ], + // }, + + // { + // type: "category", + // label: "ML Deployment", + // link: { + // type: "generated-index", + // title: "Deploying Machine Learning Models", + // description: + // "Learn how to serve ML models using Flask, FastAPI, Docker, and cloud platforms. Includes versioning and CI/CD.", + // keywords: [ + // "ml deployment", + // "mlops", + // "fastapi", + // "docker", + // "model serving", + // "ci cd", + // ], + // }, + // items: [ + // "machine-learning/deployment/flask", + // "machine-learning/deployment/fastapi", + // "machine-learning/deployment/docker", + // "machine-learning/deployment/mlflow", + // ], + // }, + + // { + // type: "category", + // label: "Project Practicals", + // link: { + // type: "generated-index", + // title: "Machine Learning Project Practicals", + // description: + // "Hands-on ML projects for real-world learning — from EDA to model training, tuning, and deployment.", + // keywords: [ + // "ml projects", + // "ml practicals", + // "machine learning hands on", + // "real world ml", + // ], + // }, + // items: [ + // "machine-learning/projects/eda-project", + // "machine-learning/projects/regression-project", + // "machine-learning/projects/classification-project", + // "machine-learning/projects/clustering-project", + // "machine-learning/projects/deployment-project", + // ], + // }, + ], }; export default sidebars;