Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear regression Module Review #907

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Linear regression Module Review #907

wants to merge 10 commits into from

Conversation

dsbuddy
Copy link
Collaborator

@dsbuddy dsbuddy commented Mar 21, 2024

No description provided.

@dsbuddy dsbuddy requested a review from rosemm March 21, 2024 15:16
Copy link
Contributor

@rosemm rosemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a bunch of comments and suggestions throughout, let me know if anything is unclear or if you'd like further explanation on anything!

***
<div class = "answer">

This question is more difficult than the previous one because it requires the test-taker to have a deeper understanding of the characteristics of linear regression. The test-taker must be able to identify which of the answer choices is not a characteristic of linear regression, even though all of the other answer choices are valid characteristics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general tip: The quiz questions and answers should be designed for learners to work through independently (not for an instructor to administer, for example). So the follow-up text for a quiz question should do things like provide more context about the correct answer, explain why the other options are incorrect, etc., all with the learner in mind as the audience. I seems like you've written your quiz follow-ups more as notes for an instructor explaining the rationale behind the question.

## What is linear regression?
- Linear regression is a supervised machine learning algorithm that learns to predict a continuous target variable based on one or more predictor variables. Linear regression models the relationship between the target variable and the predictor variables using a linear equation.
- In the case of linear regression, the target variable is a continuous variable. In a supervised learning problem, the machine learning algorithm is given a set of training data and asked to learn a function that can map the input variables to the output variable. The training data consists of pairs of input and output variables. The algorithm learns the function by finding the best fit line to the data. Once the algorithm has learned the function, it can be used to make predictions on new data. To make a prediction, the algorithm simply plugs the values of the input variables into the function.
- Linear regression is a popular supervised learning algorithm because it is simple to implement and understand. It is also a versatile algorithm that can be used to solve a variety of problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend avoiding telling learners a topic is simple, or easy to understand -- it risks making them feel inadequate if they don't feel like it's clicking right away.

Suggested change
- Linear regression is a popular supervised learning algorithm because it is simple to implement and understand. It is also a versatile algorithm that can be used to solve a variety of problems.
- Linear regression is a popular supervised learning algorithm because it is computationally simple (even if it's not always simple to interpret!). It is also a versatile algorithm that can be used to solve a variety of problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you resolved this without changing anything; was that just a mistake, or do you feel strongly about keeping this language?
By the by, here's a handy resource on this topic (we should probably reference that in our authoring guidelines!)

Comment on lines 102 to 104
- **Predicting customer churn:** Linear regression can be used to predict whether a customer is likely to churn based on their past purchase history, demographics, and other factors.
- **Predicting the risk of a customer defaulting on a loan:** Linear regression can be used to predict the risk of a customer defaulting on a loan based on their credit score, income, and other factors.
- **Predicting the likelihood of a patient having a particular disease:** Linear regression can be used to predict the likelihood of a patient having a particular disease based on their medical history, symptoms, and other factors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three are all examples of logistic regression, probably, rather than linear regression per se

***


### Applications of linear regression in machine learning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the impulse here to show concrete examples and highlight real-world use cases, but I'm concerned that the specific examples here won't necessarily be relevant to our learners.
I think you could make this section much stronger by replacing this list (and the similar one in the next section) with a much shorter but more targeted list of linear regression applications in biomedical research. It would be ideal to find actual published studies using linear regression and link to those.
I know this is a big ask -- I'm happy to help try to find appropriate examples for this!


### Python Implementation of Linear Regression

To implement linear regression in Python using Scikit-learn, we can follow these steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an excellent example, and it would be great to break it up a little more, especially steps 3-5. I think each of the step here could probably be its own subsection, with a header, and the explanation you're currently providing via comments could be moved out into regular text accompanying each code chunk.

data.info()
```

3. Split the data into training and testing sets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the inclusion of this, and I'm thinking this could be something learners may be encountering for the first time here -- cross validation / machine learning techniques are not currently part of the pre-reqs for this module. There are few different possible ways to approach this, but I think one thing that may work well is to have a new section before this example that explains at a high level some of the implementation stuff you then use in this example. I'd recommend short explanations (especially why we do this) of splitting data into training and test, recoding categorical predictors, scaling continuous predictors, and evaluating model predictions (i.e. what is MSE, conceptually?)

print(diabetes)
print(diabetes.DESCR)

# Now we will split the data into the independent and independent variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Now we will split the data into the independent and independent variable
# Now we will split the data into the independent and dependent variable




### Real World Code Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great, and I think you could just do this example or the above, we probably don't need both.


## Conclusion

At the end of the lesson, students should have a good understanding of the concept of linear regression and how to implement the linear regression algorithm in Python. They should also be able to apply linear regression to real-world datasets to make predictions and insights.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like all the module text, this should be written with learners as the audience, not instructors.

Comment on lines +30 to +31
- Understand the concept of linear regression and its applications in machine learning
- Learn how to implement the linear regression algorithm in Python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are both pretty big topics, actually, and I'm wondering now if it might be worthwhile splitting this module into two separate modules: "Intro to Linear Regression for Machine Learning", and "Linear Regression in Python".
That will allow you to focus more attention on actually teaching the python code, which would be great. The prereqs don't list specific experience with scikit learn or anything like that, so we want to write this with a learner in mind who has some python experience but has maybe never done machine learning before. Linear regression is a really natural place to start with modeling, so I love the idea of this module being something someone could work through as their first attempt at machine learning in python.
I'll start a new branch for the intro to linear regression module, and we can keep this one for the python piece.

@rosemm
Copy link
Contributor

rosemm commented Apr 24, 2024

Hi @dsbuddy ! I took another look and added in some more comments and suggestions, the biggest of which is that we split this into two separate modules (see my comment above). That will give you a lot more space to teach the python piece of it more thoroughly, which I think will be really valuable. I'll start a new branch now for the "intro to regression for ML" module (edit: I did! https://github.com/arcus/education_modules/tree/intro_regression_ml ), and you can update the draft on this linear_regression branch here to just focus on teaching the method in python.
My advice is to read through your "Python Implementation of Linear Regression" section thinking about how a learner would work through that content (Which terms might be new to them? What questions might they have as they go through the code? Which pieces might feel confusing to someone who's never used scikit learn before?). You may also find it helpful to go back to what you have listed as the prereqs for this module and imagine a learner who is brand new to this topic but does (just barely) meet the prereqs as defined. Keep this imaginary learner in mind as you read through -- if the module would go over that person's head, then we need to adjust the module, the prereqs, or both.
As you realize which bits need more explanation and/or links to relevant resources (some learn-more, options, or help boxes might be appropriate!), start filling those in. I think you'll end up wanting to break that section into multiple subsections as you add more explanation, so feel free to put in subheaders as appropriate, too.

@rosemm
Copy link
Contributor

rosemm commented Apr 24, 2024

FYI: Here is the PR with the new "Intro to Linear Regression for Machine Learning" module: #923

@rosemm
Copy link
Contributor

rosemm commented May 3, 2024

Hi @dsbuddy let me know if you have any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants