# Herd Analysis Report

## Background

Dairy producers of all sizes are under an ever-present economic pressure to produce more with less to meet the global demand for dairy products.  As a result, farmers need to carefully monitor the environments of their cattle to prevent injury, encourage production, and stop the spread of disease.  As a result, careful examination of input factors such as genetics, nutrition, climate, facilities, and negative health-events may provide actionable insights into ways to modify operations and improve the health, well-being, and production of their herds.

## Report Objective

This report is intended to provide insight and guidance aimed at improving herd health and milk production volumes of a unique herd in Franklin County, Pennsylvania.  The current scope of analysis can be grouped into the following facilities and weather, herd composition, and nutrition.  As a greater breadth of data becomes, available additional insights may be drawn.  This analysis focuses on milk production data for calendar years 2016 and 2017.

### Findings: Milk Production History

An estimated 522,244 gallons of milk (4,493,019 milk-pounds) were produced between January 1, 2016 and December 31, 217. The total milk pounds produced per month ranged from 313,223 milk-pounds (36,421 gallons) to 395,524 milk-pounds (45,991 gallons).  The chart below provides a monthly summary of total herd production by month for the two year period.

<div class='col-md-12'><img style="height:auto" src="figures/herd-total-milk-by-month.png"></div>

For calendar years 2016 and 2017, the total number of animal milked per month ranged from 141 cows to 155.  The plot below provides a visual description of the total number of animals milked per month.  The number of animals milked per month has an upper constraint dictated by the size of the milking and housing facilities.  The targeted number of cows actively milking for the facilities in this analysis is 155 cows per month.

<div class='col-md-12'><img style="height:auto" src="figures/count-cows-milked-per-month.png"></div>

### Findings: Performance by Number of Lactations

Older cattle having gone through at least three lactations, outperformed younger cattle by 17.5% for the first 305 days after calving, with an average of 24,435 milk-pounds (2,841 gallons) compared to 20,789 milk-pounds (2,417 gallons).  The performance gap is most pronounced in the 20-100 day post-calving range but gradually decreased to zero until approximately 400 days after calving.  The wide range for milk-weights beyond 400 days are believed to be the result of incomplete calving data.  The visualization below provides a perspective on the average milk weight produced after calving for aged cows versus cows who are less than 36 months old.

<div class='col-md-12'>
    <div class='col-md-12'><img style="height:auto" src="figures/milk-production-after-calving.png"></div>
</div>

### Findings: Calvings per Month

To continue regular operations with staggered dry periods, calvings continued throughout the 2016 and 2017 calendar years.  The monthly average number of calves born was 16.75, with 203 calves born in 2016 and 199 calves born in 2017.  The plot below indicates that a steady stream of calves allowed for cows to have to a regular dry period 2-3 month dry period after each lactation while permitting the herd overall to keep a relatively consistent output for year-round income.

<div class='col-md-12'>
    <div class='col-md-12'><img style="height:auto" src="figures/calvings-per-month.png"></div>
</div>

### Findings: Top Producers 2016-2017

For 2016 and 2017 calendar year, the top 10% of the animals by total milk volume are pictured below.  Animals were only considered for this analysis if they had produced milk for 400 or more days.  Of cows that milked more than 400 days in that two year period at the top 25% produced at least 44,050 pounds of milk and the top 50% of the herd produced at least 39,343 pounds of milk in the same two year period.  The plot below highlights the top 10 producing animals by total milk volume in that two year period.  As high producers, these animals should be retained, and animals with similar profiles should be added to the herd to better ensure higher milk volumes and associated financial success.

<div class='col-md-12'><img style="height:auto" src="figures/top-producers.png"></div>

### Findings: Under Performers, 2016-2017

For 2016 and 2017 calendar year, the bottom 10% of the animals by total milk volume are pictured below.  These animals were milked for more than 400 days in that two year period, and produced at most 30,398 milk-pounds.  These animals should be evaluated for medical conditions, and prioritized for replacement with cows that are capable of producing higher milk weights.

<div class='col-md-12'><img style="height:auto" src="figures/bottom-producers.png"></div>

### Findings: Cluster Analysis

Leveraging Principal Component Analysis, and K-Means Clustering, no strong clustering was identified.  This was indicated by a low silhouette score across many combinations of selected features.  Though a significant number of low performing animals were concentrated in a single cluster of cattle.  , the weak clustering does give confidence that additional insights can be garnered from this analysis. 

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>milk_weight_predicted</th>
      <th>milk_weight</th>
      <th>days_since_calving</th>
      <th>age_in_months</th>
      <th>dairy_form</th>
      <th>udder_score_aggregate</th>
      <th>dairy_strength_aggregate</th>
      <th>final_score</th>
      <th>breed_age_average</th>
      <th>milk_score</th>
      <th>ctpi</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>40028</th>
      <td>70.850000</td>
      <td>83.6</td>
      <td>130.0</td>
      <td>40.0</td>
      <td>24.0</td>
      <td>79.0</td>
      <td>78.0</td>
      <td>78.0</td>
      <td>100.900002</td>
      <td>-238</td>
      <td>1643</td>
    </tr>
    <tr>
      <th>53085</th>
      <td>96.778571</td>
      <td>112.4</td>
      <td>127.0</td>
      <td>80.0</td>
      <td>29.0</td>
      <td>86.0</td>
      <td>85.0</td>
      <td>85.0</td>
      <td>106.000000</td>
      <td>-84</td>
      <td>1618</td>
    </tr>
    <tr>
      <th>11013</th>
      <td>53.185714</td>
      <td>44.1</td>
      <td>197.0</td>
      <td>65.0</td>
      <td>45.0</td>
      <td>84.0</td>
      <td>88.0</td>
      <td>85.0</td>
      <td>106.000000</td>
      <td>-608</td>
      <td>1493</td>
    </tr>
    <tr>
      <th>5471</th>
      <td>55.571429</td>
      <td>68.1</td>
      <td>225.0</td>
      <td>72.0</td>
      <td>35.0</td>
      <td>83.0</td>
      <td>86.0</td>
      <td>85.0</td>
      <td>107.199997</td>
      <td>31</td>
      <td>1534</td>
    </tr>
    <tr>
      <th>22920</th>
      <td>84.721429</td>
      <td>92.0</td>
      <td>236.0</td>
      <td>58.0</td>
      <td>31.0</td>
      <td>83.0</td>
      <td>86.0</td>
      <td>83.0</td>
      <td>107.199997</td>
      <td>805</td>
      <td>1950</td>
    </tr>
    <tr>
      <th>43042</th>
      <td>107.300000</td>
      <td>111.5</td>
      <td>44.0</td>
      <td>38.0</td>
      <td>25.0</td>
      <td>74.0</td>
      <td>83.0</td>
      <td>77.0</td>
      <td>99.800003</td>
      <td>330</td>
      <td>1648</td>
    </tr>
    <tr>
      <th>56567</th>
      <td>51.042857</td>
      <td>50.7</td>
      <td>273.0</td>
      <td>68.0</td>
      <td>38.0</td>
      <td>86.0</td>
      <td>85.0</td>
      <td>85.0</td>
      <td>107.199997</td>
      <td>-36</td>
      <td>1567</td>
    </tr>
    <tr>
      <th>32328</th>
      <td>86.228571</td>
      <td>85.6</td>
      <td>152.0</td>
      <td>43.0</td>
      <td>28.0</td>
      <td>79.0</td>
      <td>78.0</td>
      <td>79.0</td>
      <td>101.199997</td>
      <td>1004</td>
      <td>1761</td>
    </tr>
    <tr>
      <th>45684</th>
      <td>57.971429</td>
      <td>46.8</td>
      <td>264.0</td>
      <td>-18.0</td>
      <td>35.0</td>
      <td>81.0</td>
      <td>87.0</td>
      <td>85.0</td>
      <td>106.000000</td>
      <td>-338</td>
      <td>1520</td>
    </tr>
    <tr>
      <th>30909</th>
      <td>83.071429</td>
      <td>86.6</td>
      <td>17.0</td>
      <td>37.0</td>
      <td>23.0</td>
      <td>85.0</td>
      <td>82.0</td>
      <td>82.0</td>
      <td>103.599998</td>
      <td>486</td>
      <td>1835</td>
    </tr>
  </tbody>
</table>

<div class='col-md-12'>
    <div class='col-md-6'><img style="height:auto" src="figures/silhouette-score.png"></div>
    <div class='col-md-6'><img style="height:325px" src="figures/kmeans-clusters.png"></div>
</div>

At this time, no significant findings were obtained from the cluster analysis. Future iterations may reveal more insights.  The following modifications of the analysis should be considered as data becomes available:

- Apples-to-apples comparison of animals only including milking performance in their first or second calvings
- Identification of top performing animals based upon lifetime production value including milk-weight, somatic cell count, and butterfat.
- Inclusion of health event statistics, including illnesses, injuries, breeding metrics

### Findings: Facilities and Climate

In the years 2015, 2016, and 2017, Franklin County experienced a total of 6 days of where the average of 3 weather stations recorded maximum temperatures greater than 90 degrees, and 33 days where a low of temperature less than 10 degrees was recorded.  The volume of milk produced during and immediately after the extreme temperature days did not see a statistically significant impact on production volumes. As a result, current data suggest that existing facilities and practices have proven sufficiently effective for heat and cold abatement.  The current recommendation is to maintain existing ventilation, cooling, insulation and heating strategies.  Additional capital investment to improve these facilities and practices beyond regular maintenance may not lead to improved milk volumes.  

### Findings: Predicting Daily Milk Weights



In [1]:
# Long description of what the point was

# Sentence describing this as an example of 'dateless' milk profiles that were created

<div class='col-md-12'>
    <div class='col-md-12'><img style="height:auto" src="figures/regression-data-example.png"></div>
</div>

In [2]:
# Brief description of Model Performance

In [3]:
# Images of model performance

In [4]:
# Implications

In [5]:
# Predicting 2018 Milk Weights

---

## Appendix
### Appendix: Selected Terms

The following terms should provide additional context for those unfamiliar with the Dairy Industry.

#### Milk Weight (milk-pounds)

The amount of milk produced by an animal. Measured in pounds of milk. For reference, a gallon of milk weighs approximately 8.6 pounds.

#### Dry Period

The period when a cow is not producing milk. Often serves as a time of rest following a lactation period.

#### Lactation Period

The period when a cow is producing milk.

#### Days Since Calving

The number of days that have passed since a cow has given birth.

#### Linear Classification Score

An integer score between 50-99 given to a milk cow, providing a numerical representation of how well a the physical attributes of an animal fits the profile of an 'ideal' milking cow. A weighted summarization of 18+ assessments of a given animal.

### Appendix: Data Pipeline - Milk Weight

#### Description

Daily milk-pound production data was derived from the on-site storage from the DeLaval - ALPRO™ herd management system from files such as [this](../references/example_files/milk_volume_example.txt).  Daily log files were collected for a date range spanning from July 2015 to December 2017.  Approximately 15 files were corrupted, and no log files were retained. The results of these milking sessions are captured daily system logs in a series of text files from the local storage. The following lines provide an example of relevant data elements:

``` txt
04:52:14    R    1831    Cow    Duration1    6:25
04:52:14    R    1831    Cow    AverFlow1    3.6
04:52:14    R    1831    Cow    PeakFlow1    4.8
04:52:14    R    1831    Cow    MilkToday1    23.2
```

The lines above suggest that Cow #1831 produced 23.2 pounds of milk, in six minutes and twenty-five seconds with an average flow rate of 3.6 lb/min and a peak flow of 4.8 lb/min.  Also, this milking occurred at 04:52:14 am.

#### Raw Data Acquisition

The system logs were manually retrieved from the herd management system and uploaded into secured private storage utilizing Amazon Web Services (AWS) for on-demand, repeatable retrieval, processing, and backup.

#### Data Wrangling

Prior to analysis, the contents of each log file are [downloaded via script](../scripts/get_data.py) from AWS and brought into local storage. Each file is [processed individually](../scripts/parse_milk_volume.py) and put into [local storage](../scripts/load_database.py) for future analysis.

#### Future Improvments

This process can be improved through an automated retrieval, ingestion, and cleansing of daily milk production data.  This process would be enabled by the connection of the herd management system to an active network connection and the creation of automated scripts to conduct daily uploads of production data.

#### Appendix: Data Pipeline - Linear Classification Score

##### Description

Linear Classification Scores provide a periodic assessment of the physical attributes of a given animal.  Animals are classified on a scale from 50-99 based on some measured characteristics for comparison against the 'ideal' milking cow.  These [Linear Classification Reports](http://www.holsteinusa.com/programs_services/classification.html)  were conducted by a representative of [Holstein Association USA](http://www.holsteinusa.com/) between August 8, 2014, and July 10, 2017.

``` txt
8/5/14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
BARN_ID,AGE,LAC,DATE_CALVED,ST,SR,BD,DF,RA,RW,LS,RL,LO,FA,FU,UH,UW,UC,UD,TP,RT,TL,UT,CS,FC,DS,RP,FL,MS,FS,E,%BAA
1485,7-Jul,6,7/10/14,50,45,44,42,25,35,29,25,25,25,14,26,35,50,5,35,35,35,35,25,92,92,92,82,80,85,,106
1542,9-Jun,4,8/6/13,50,35,35,42,15,35,50,25,,25,35,36,35,35,40,25,25,25,26,17,93,93,90,84,93,91,2,113.5
```

The example above shows the scoring for cows number 1485 and 1542.  The animals received a final linear score of 85 and 91 respectively.  The assessment occurred on August 5, 2014.

##### Raw Data Acquisition

They were retrieved in the form of paper reports. The contents of the reports were scanned to PDF and parsed into [csv files](../references/example_files/classification_example.csv) using the program [PDF Element by Wondershare](https://pdf.wondershare.com/).  The resulting CSVs were uploaded to a private AWS S3 bucket for on-demand, repeatable retrieval via [script](../scripts/get_data.py)

##### Future Improvments

This process can be improved through integration with the Holstein USA online systems.  A software integration was explored early in the process, but was abandoned due to cost prohibitive pay-per-drink model per-animal per classification.  In the event of further automation, an alternative data acquisition process would be required to prevent the analysis from becoming too costly.

### Appendix: Data Pipeline - Genetic Evaluations

#### Description

[Holstein Association USA](http://www.holsteinusa.com/) conducts additional analysis on individual animal genetics based on availible pedigree data, genomic sequencing, as well as actual production information from the animal and it's genetic siblings where availible.  CTPI and Milk are two values from this report that represent the 
[CTPI](http://www.holsteinusa.com/genetic_evaluations/ss_tpi_formula.html) as an aggregated indicator of milking performance and Milk as an indicator focused solely on likelyhood of higher volumes of milk production.  In both cases higher values are more favorable.

``` csv
ANIMAL_ID,NAME,FS,PRO,%P,Fat,%F,Rel,Milk,SCS,PL,DPR,TYPE,REL,UDC,FLC,CTPI
1999 ,"     BELSHWAY PLANET 1999
     USA 71404944100-NA12/12/2012",86 ,49,-0.02,40,-0.09,50 ,1772,2.94 ,3.4,0.2,1.34,53 ,0.55,-0.29 ,2198
2043 ,"     BELSHWAY MASSEY 2043
     USA 72758233100-NA 06/26/2013",79 ,36,0.01,38,-0.01,47 ,1132,2.76 ,3.6,-0.2,0.94,53 ,0.66,0.94 ,2150
```

The example above indicates that animal with the ID of 1999 had a Milk Indicator of  1772 a CTPI of 2198.  Cow #2043 had a milk indicator of 1132.

#### Raw Data Acquisition

They were retrieved in the form of paper reports. The contents of the reports were scanned to PDF and parsed into csv files using the program [PDF Element by Wondershare](https://pdf.wondershare.com/).  The resulting CSVs were uploaded to a private AWS S3 bucket for on-demand, repeatable retrieval via [script](../scripts/get_data.py)

#### Future Improvments

This portion of the data pipeline can be improved through integration with the Holstein USA online systems.  A software integration was explored early in the process, but was abandoned due to cost prohibitive pay-per-drink model.  The expected format of the [CTPI](http://www.holsteinusa.com/genetic_evaluations/Topctpi.html) report is availble online.  In the event of further automation, an alternative data acquisition process would be required to prevent this analysis from becoming too costly.

### Appendix: Data Pipeline - Weather Data

#### Description

The weather data set consists of daily summaries of weather measurements for Franklin County, Pennsylvania such as low temperature, high temperature, and total precipitation.  The following provides an example of the csv file format.

```
STATION,NAME,LATITUDE,LONGITUDE,ELEVATION,DATE,PRCP,PRCP_ATTRIBUTES,SNOW,SNOW_ATTRIBUTES,SNWD,SNWD_ATTRIBUTES,TMAX,TMAX_ATTRIBUTES,TMIN,TMIN_ATTRIBUTES,TOBS,TOBS_ATTRIBUTES,WESD,WESD_ATTRIBUTES,WESF,WESF_ATTRIBUTES,WT01,WT01_ATTRIBUTES,WT03,WT03_ATTRIBUTES,WT04,WT04_ATTRIBUTES,WT06,WT06_ATTRIBUTES,WT11,WT11_ATTRIBUTES
USC00361354,"CHAMBERSBURG, PA US",39.9353,-77.6394,195.1,2016-01-01,0,",,7,2100",0,",,7",0,",,7",38,",,7",34,",,7",34,",,7,2100",,,,,,,,,,,,,,
USC00361354,"CHAMBERSBURG, PA US",39.9353,-77.6394,195.1,2016-01-02,0,",,7,2100",0,",,7",0,",,7",42,",,7",28,",,7",29,",,7,2100",,,,,,,,,,,,,,

```

#### Raw Data Acquisition

The CSV files were requested from the [NOAA Online Climate Data Online Search](https://www.ncdc.noaa.gov/cdo-web/search) for full calendar year 2014, 2015, and 2016, and then again for all available data in 2017.  The resulting CSV files were uploaded to AWS S3 to be programmatically retrieved by the script [get_data.py](../scripts/get_data.py). The raw files are processed by in the script [parse_weather.py](../scripts/parse_weather.py) to produce daily weather summaries.

#### Future Improvments

To automate the retrieval of ongoing weather information, the analysis could be supplemented with scripted interaction with the [NOAA weather API](https://www.ncdc.noaa.gov/cdo-web/webservices/v2).