Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine learning workshop with the USDA Agricultural research service #99

Closed
arivers opened this issue May 2, 2019 · 31 comments
Closed

Machine learning workshop with the USDA Agricultural research service #99

arivers opened this issue May 2, 2019 · 31 comments

Comments

@arivers
Copy link

@arivers arivers commented May 2, 2019

On April 30 I met with the board about teaming upr to put together a 2-day machine learning workshop based on Google's machine learning crash course:

https://developers.google.com/machine-learning/crash-course/

We plan to have 20-30 USDA-ARS participants and provide 3-4 instructors.

We would want help from UF Carpentries in:

  • modifying the curriculum
  • getting additional instructors
  • finding a space to run the workshop on campus

I am also planning to contact Andy Li at the NSF Center for Big Learning on campus about potential partnerships.

@ha0ye

This comment has been minimized.

Copy link
Member

@ha0ye ha0ye commented May 4, 2019

@arivers If it's alright with you, I can try and put the word out about your workshop to the UF Informatics-Training listserv to see if there are interested folks.

Also, do you know if you'll need/want HPC support during the workshop? (sorry if this is covered in the course content - I haven't taken a look at that yet)

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented May 13, 2019

@ha0ye , Yes feel free to put the information on the informatics training listserv.

I don't think we will need HPC support. Most of the datasets for training are smaller.

@MarconiS

This comment has been minimized.

Copy link

@MarconiS MarconiS commented May 13, 2019

Hi @arivers
I am interested in participating (either as instructor or helper), but I have quite a lot going on this summer, and hence am wondering how much time commitment do you all expect needing to build the material. At first glance that's a LOT of material to cover in 2 days. Thanks!

@stuckyb

This comment has been minimized.

Copy link

@stuckyb stuckyb commented May 13, 2019

I agree about not needing HPC support. My feeling is that if this workshop requires HPC resources, we're probably doing it wrong!

Also, I'd be interested in helping out with this workshop, either as instructor or helper.

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented May 13, 2019

Hi @MarconiS,

I think there are several levels at which someone could participate.

The most involved would be as a co-creator, working with me on modifying the lessons, testing out the material and organizing the workshop.

The next most involved would be as a co-instructor, teaching a chunk of the course, doing the live coding, etc.

The least time commitment would be acting as a mentor during the workshop going around helping students do the exercises and answering questions.

@gklarenberg

This comment has been minimized.

Copy link

@gklarenberg gklarenberg commented May 13, 2019

@arivers I'd be interested in helping out, but also have some travel going on this summer. Do you have an idea of when to do this workshop?

@MarconiS

This comment has been minimized.

Copy link

@MarconiS MarconiS commented May 13, 2019

Gotcha! More than happy at least to be involved as a co-instructor; will be glad to carve some time out for assisting you all as co-creator, if you'll need me :)

@gaurav

This comment has been minimized.

Copy link
Member

@gaurav gaurav commented May 13, 2019

I'm also happy to help out in any way I can!

@hugedata

This comment has been minimized.

Copy link

@hugedata hugedata commented May 13, 2019

Hi @arivers,

I'd be interested in co-creating parts of the course and also as an instructor.

Do you have a time-frame for when the course will take place?

Best,
Dimitri

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented May 13, 2019

We do not have a date set but were thinking of late August.

@hugedata

This comment has been minimized.

Copy link

@hugedata hugedata commented May 13, 2019

The Fall semester starts very early - August 20. Depends on when it will be easier to reserve space on campus, before or after the semester starts.

@kokbent

This comment has been minimized.

Copy link

@kokbent kokbent commented May 13, 2019

Happy to help out on this.

While this shouldn't need to use HPC, but it is also important to have a good machine to work on. Often an 8-year-old mid-spec laptop is not a good idea.

Ben

@Nits11

This comment has been minimized.

Copy link

@Nits11 Nits11 commented May 13, 2019

Hi @arivers ,
I will be interested in helping.
Please intimate around what time you plan to schedule the workshop.
Thanks
Nitya

@hugedata

This comment has been minimized.

Copy link

@hugedata hugedata commented May 13, 2019

Hi @kokbent,

IF all participants could have a HPC account this will make things much easier.

Tools like Keras and Tensorflow are already there, and even GPUs could be used.
Of course participants should not run on the login nodes, but on the dedicated
development nodes.

 Best,
 Dimitri
@Nits11

This comment has been minimized.

Copy link

@Nits11 Nits11 commented May 14, 2019

@kokbent

This comment has been minimized.

Copy link

@kokbent kokbent commented May 14, 2019

Hi @kokbent,
IF all participants could have a HPC account this will make things much easier.

Tools like Keras and Tensorflow are already there, and even GPUs could be used.
Of course participants should not run on the login nodes, but on the dedicated
development nodes.
Best,
Dimitri

Yes this should be some of the first few issues to discuss. I've skimmed through the Google Crash Course, and it seems like the "programming exercise" is done in "colaboratory", which is a decently-skin jupyter notebook hosted somewhere in google's server and interfacing with Google drive. It seems to be free and it's what they use to run tensorflow and even training of neural network. This could be an attractive option because all we need is a google account and a browser.

@hugedata

This comment has been minimized.

Copy link

@hugedata hugedata commented May 14, 2019

It's a good starting point. I have done the whole Google Tensorflow crashcourse, and it was instructive to run the examples from Python as well. The HPC also provides options to run Jupyter notebooks remotely, so this could be set-up if needed.

@andorfc

This comment has been minimized.

Copy link

@andorfc andorfc commented May 17, 2019

@arivers,

I would be happy to help review the course material and act as mentor if needed.

-Carson

@sunray1

This comment has been minimized.

Copy link

@sunray1 sunray1 commented May 19, 2019

I'd be glad to help too if you need anything!

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented May 21, 2019

@ha0ye Do you know who would be best person to help me look for rooms and dates that are available on campus? I was thinking of doing a larger workshop with about 45-50 participants sometime in August or early September. I know the library had a room but I think it was for smaller groups. We could also rent space at Emerson hall potentially. Also, are there are dates that would be good to avoid based on the UF calendar?

@ha0ye

This comment has been minimized.

Copy link
Member

@ha0ye ha0ye commented May 22, 2019

@arivers Flora and Alethea (whose emails you should have) would be good to go through first. I'm not sure what the spaces are like in the library, but we were able to reserve some of the CSE lab spaces / classrooms last summer that were big enough.

@magitz

This comment has been minimized.

Copy link

@magitz magitz commented May 23, 2019

Kind of late in joining this, but I'd be happy to help co-instruct this. Has there been progress on working on the curriculum? Might be able to help with that if needed.

Matt

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented Jun 4, 2019

We have a tentative date and location for the course of August 27-28 at the Reitz Union.

I would like to meet Tuesday, June 11, 2019 at 10:00 AM at the USDA-ARS Lab 1600 SW 23 Dr, Gainesville, FL 32608 to discuss proposed changes to the Google course curriculum for our workshop. Park anywhere onsite and stop by the small administration building (No. 30) between the two large brick buildings to be directed to the conference room.

You can also join by Webex
Meeting number: 961 217 196
https://ars-usda.webex.com/ars-usda/j.php?MTID=m2f4e9d3596cc6ae20546cf6ff66549e5
Join by phone
1-888-8449904 Call-in toll-free number (ATT Audio Conference)
1-816-4234261 Call-in number (ATT Audio Conference)
481 288 6 Access Code

Key topics discussed will be:

  • How appropriate is the Google material
  • The intended audience (people with some python and stats experience)
  • The pacing
  • tensorflow vs scikit-learn
  • suggestions on modifying the format or material for our students

For reference the link to the Google course is here:
https://developers.google.com/machine-learning/crash-course/

Please comment if you intend to come or if you want to participate but cannot come.

The proposed schedule for the course is below.

August 27

Time Lesson
08:00 Framing ML problems
8:20 Getting started linear regression and loss
8:40 Reducing loss: iteration, gradient descent, learning rate, Stochastic gradient descent
9:40 Getting started with Tensorflow and Scikit Learn
10:40 Generalization and the Variance bias tradeoff
11:00 Break
11:30 Splitting Training and test data sets
12:00 Lunch Break (participants purchase their own meals)
1:30 Validation data sets
2:15 Representations: feature selection, engineering, data cleaning
3:15 Feature crosses: encoding non-linearity
4:15 Regularization: simplicity (L2)
5:15 Adjourn

August 28

Time Lesson
08:00 Logistic regression
8:30 Classification: Thresholding
8:40 Classification: True vs. False and Positive vs. Negative
8:50 Classification: Accuracy
9:00 Classification: Precision and Recall
9:15 Classification: Precision Recall & Receiver Operating Characteristic (ROC) curves
9:45 Classification: Prediction Bias
10:00 Regularization: Sparsity (L1)
11:00 Neural Networks
12:00 Lunch Break (participants purchase their own meals)
1:30 Training Neural Nets
2:30 Embeddings
3:30 Overview of Methods: Classification, Regression, Clustering, Dim. Reduction
4:30 Resources for doing ML in your lab when you leave
5:00 Answering final questions
5:30 Adjourn
@stuckyb

This comment has been minimized.

Copy link

@stuckyb stuckyb commented Jun 4, 2019

I'll be there!

@Nits11

This comment has been minimized.

Copy link

@Nits11 Nits11 commented Jun 6, 2019

@hugedata

This comment has been minimized.

Copy link

@hugedata hugedata commented Jun 7, 2019

Hi Adam,

I am planning to attend.

Best,
Dimitri

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented Jun 11, 2019

As a reminder, we are meeting at 10:00AM today at the USDA-ARS Lab 1600 SW 23 Dr, Gainesville, FL 32608 to discuss proposed changes to the Google course curriculum for our workshop. Park anywhere onsite and stop by the small administration building (No. 30) between the two large brick buildings to be directed to the conference room.

You can also join by Webex
Meeting number: 961 217 196
https://ars-usda.webex.com/ars-usda/j.php?MTID=m2f4e9d3596cc6ae20546cf6ff66549e5
Join by phone
1-888-8449904 Call-in toll-free number (ATT Audio Conference)
1-816-4234261 Call-in number (ATT Audio Conference)
481 288 6 Access Code

For the next meeting we will schedule a time that works for all interested people.

@kokbent

This comment has been minimized.

Copy link

@kokbent kokbent commented Jun 11, 2019

Could not join the discussions because I need to attend an interview, will there be a brief minutes about the meeting?

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented Jun 11, 2019

@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented Jun 14, 2019

I'm moving this corrdination discussion to a repository specific to the ml training class so we can divide up issues into multiple threads. The new repo is here:

USDA-ARS-GBRU/ml-training-site#1

Please follow that Repository to get updates. The results of our first meeting are on that repository in the issues and wiki sections.

@ha0ye ha0ye closed this Jun 14, 2019
@arivers

This comment has been minimized.

Copy link
Author

@arivers arivers commented Jul 22, 2019

We have set up a site and curriculum for the ML course: https://usda-ars-gbru.github.io/ml-training-site/ on August 27-28.

We are still looking for a few more people who are interested in helping or teaching a small module for the course. Please let me know if you are interested and fill out this doodle pool of potential times you could meet over the next month: https://doodle.com/poll/wtbzz4mafgnciffx . Even if you cannot meet there are other ways to get involved.

@ha0ye ha0ye reopened this Jul 24, 2019
@ha0ye ha0ye closed this Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.