Discussing the ongoing digitalization of the agricultural industry with a friend over Christmas, I started to wonder if I would be able to get access to some agricultural production data myself. Luckily, my partner's uncle is in fact a dairy farmer and has been using a milking robot for the past couple of years on his small dairy farm in Heusden, the Netherlands.
Using a backup of a database with data generated by a Delaval milking system, I set up a Docker instance on my Macbook to run a local SQL server in order to be able to access this database. Initially I used Azure Data Studio to explore this database and find the right data tables to use in an Exploratory Data Analysis, or EDA. In this project however, we will be connecting directly to the database running on this SQL-server using the package sqlalchemy.
Using an Amazon S3 bucket and RDS, I managed to host the database in the cloud. Any further analyses will be done by connecting this database in the cloud.
This repo will contain several applications of machine learning, such as:
- Yield prediction: Start with linear regression, write gradient descent algorithm manually and then apply Ridge/Lasso regularization in order to find out what drives milk prediction and set up a model that can predict the amount of milk given per cow
- Invalid Yield prediction: It is possibile for cows not to give milk in the machine, resulting in an Invalid Yield-reading in the database. Using different classification algorithms such as XGBoost and neural networks (using Tensorflow) we'll set up a classifier
- Predictive Maintenance: Similarly to the previous bullet we can use data in the database to predict when the Delaval machine will break down. This will be done using similar techniques where we will evaluate the best classifier