# Calculate Yield Response to Variable Nitrogen Rates
---

**Name**: Adrian Correndo

**Semester**: Spring 2019

**Project area**: Agronomy

## **Objective**

Automating the calculation of grain yield (GY) response to different rates of nitrogen (N) fertilizer, related fertilizer use efficiencies (NUE). For each experiment, get the GY with no N added (Y0) and the maximum observed yield (Ymax). Create subgroups of experiments based on soil texture (STx).

## **Data example**
![Example](https://github.com/adriancorrendo/project/Scatter.JPG)

## **Outcomes**

*.csv file with up to 8 columns: Trial, STx, Nrate, GY, Y0, Ymax, NR, and NUE, where:

**-Y0**: GY when Nrate=0;
**-Ymax**: maximum observed GY;
**-NR**: abs. nitrogen response corresponding to each fertilizer rate different from 0.

Challenge could be related to: 

i) Nrate levels (# and kg applied N) vary across trials;
ii) Y0 and Ymax values happen at **Trial** level, while the NR and NUE values, at a sub-level by a given **Trial-Nrate combination**.

## **Rationale**

Database compund by hundreds of corn nitrogen fertilizer experiments. Automating these calculations will save me a significant amount of time when processing and analyzing the database.

## **Sketch**
![Main steps of the project](https://github.com/adriancorrendo/project/sketch.jpg)

## **Coding**
### Importing modules and datafile

In [6]:
import glob
import pandas as pd
import numpy as np
glob.os.chdir('C:/Users/correndo/Desktop/Coding/project2/')
data = pd.read_csv('CornNFR.csv') # File
df = pd.DataFrame(data)
df.head(19)

Unnamed: 0,TRIAL,TEXT,Nrate,GY
0,39,clay,0,3.7
1,39,clay,36,5.057
2,39,clay,62,5.392
3,39,clay,89,7.774
4,39,clay,116,8.765
5,39,clay,143,9.899
6,39,clay,170,10.426
7,39,clay,178,10.789
8,39,clay,196,10.506
9,3,loamy_sand,0,9.088


### Sorting by "TRIAL" and by "Nrate" within TRIAL (both ascending)

In [7]:
sdf = df.sort_values(['TRIAL', 'Nrate'],ascending=True)
sdf = pd.DataFrame(sdf) # Print first 20 values
sdf.head(10)

Unnamed: 0,TRIAL,TEXT,Nrate,GY
329,1,silty_clay,0,13.317
330,1,silty_clay,84,14.434
331,1,silty_clay,140,15.267
332,1,silty_clay,196,15.405
333,1,silty_clay,280,15.496
334,2,silty_clay,0,6.604
335,2,silty_clay,84,10.691
336,2,silty_clay,140,11.122
337,2,silty_clay,196,11.246
338,2,silty_clay,280,11.722


### Filtering for N0 plots using pandas "CHAINING OPERATION"
- (Alternative: BOOLEAN VARIABLE -longer-)

In [41]:
N0_plots = pd.DataFrame(sdf[sdf.Nrate == 0])
#print(N0_plots.head(10))
N0_plots.head()

Unnamed: 0,TRIAL,TEXT,Nrate,GY
329,1,silty_clay,0,13.317
334,2,silty_clay,0,6.604
9,3,loamy_sand,0,9.088
73,4,silt_loam,0,9.742
14,5,loamy_sand,0,12.172


### Filtering for N fertilized plots (Nrate > 0) using pandas "CHAINING OPERATION"
- Then, get the uniques Nrates and frequencies

In [42]:
Nf_plots = pd.DataFrame(sdf[sdf.Nrate > 0]) #could use "!= 0" command for 'different' from zero
print("The total number of N fertilized plots is:", len(Nf_plots))
Nrates = Nf_plots['Nrate'].unique().tolist()
print("The number of different N rates is:", len(Nrates))
Nrates_freq = Nf_plots['Nrate'].value_counts().tolist()

The total number of N fertilized plots is: 345
The number of different N rates is: 34


### Creating a df with all the N rates and their frequencies (count)

In [43]:
Nrates = pd.DataFrame(data=({'N rate': Nrates, 'Counts': Nrates_freq}))
Nrates = Nrates.sort_values(['N rate'],ascending=True)
Nrates.head()

Unnamed: 0,N rate,Counts
26,27,1
19,36,3
27,54,1
10,56,16
20,62,3


### How many unique types of soil texture are in the database?

In [44]:
text_class = N0_plots['TEXT'].unique().tolist() 
text_freq = N0_plots['TEXT'].value_counts().tolist()
print("The number of different soil texture classes is:", len(text_class))

The number of different soil texture classes is: 6


### Creating a df with all the Soil Texture Classes and their frequencies (count)

In [45]:
STx = pd.DataFrame(data=({'Texture Class': text_class, 'Frequency': text_freq}))
STx

Unnamed: 0,Texture Class,Frequency
0,silty_clay,36
1,loamy_sand,8
2,silt_loam,7
3,sandy_loam,4
4,clay,3
5,silty_clay_loam,1


### Subsets by TRIALS
- Example with Trial_1, and estimating the Ymax for it

In [46]:
trial_1 = pd.DataFrame(sdf[sdf.TRIAL == 1])
max_GY = trial_1.GY.max()
print('The Ymax of Trial#1 is:', round(max_GY, 2), 't/ha')
trial_1.head()

The Ymax of Trial#1 is: 15.5 t/ha


Unnamed: 0,TRIAL,TEXT,Nrate,GY
329,1,silty_clay,0,13.317
330,1,silty_clay,84,14.434
331,1,silty_clay,140,15.267
332,1,silty_clay,196,15.405
333,1,silty_clay,280,15.496


### Grouping by TRIAL, "trials" variable (group of dfs)

In [70]:
trials = sdf.groupby("TRIAL")

### For loop to create sub dfs for each trial

In [71]:
for TRIAL, trials_df in trials:
    trials_df

In [72]:
for i in trials:
    max = trials.GY.max()
Ymax = pd.DataFrame(max)
Ymax.columns = ['GYmax']
Ymax.head()

Unnamed: 0_level_0,GYmax
TRIAL,Unnamed: 1_level_1
1,15.496
2,11.722
3,13.15
4,11.473
5,14.523


### Merging data of Y0 and Ymax data

In [73]:
df_Y = pd.DataFrame(pd.merge(N0_plots, Ymax, on='TRIAL'))
df_Y.head()

Unnamed: 0,TRIAL,TEXT,Nrate,GY,GYmax
0,1,silty_clay,0,13.317,15.496
1,2,silty_clay,0,6.604,11.722
2,3,loamy_sand,0,9.088,13.15
3,4,silt_loam,0,9.742,11.473
4,5,loamy_sand,0,12.172,14.523


### Drop 'Nrate' column

In [74]:
df_Y = df_Y.drop(columns=['Nrate'])

### Modifying columns' names

In [75]:
df_Y.columns = ['TRIAL', 'STx', 'Y0', 'Ymax']
df_Y.head()

Unnamed: 0,TRIAL,STx,Y0,Ymax
0,1,silty_clay,13.317,15.496
1,2,silty_clay,6.604,11.722
2,3,loamy_sand,9.088,13.15
3,4,silt_loam,9.742,11.473
4,5,loamy_sand,12.172,14.523


### Estimating Max N response (Delta-Y)
- Using for loop

In [76]:
for i in df_Y:
    Max_NR = df_Y.Ymax - df_Y.Y0
Max_NR = pd.DataFrame(Max_Nresp)
Max_NR.columns = ['MaxNR']
Max_NR.head()

Unnamed: 0,MaxNR
0,2.179
1,5.118
2,4.062
3,1.731
4,2.351


### Adding Delta-Y as a new column in df_Y
- Insert function

In [77]:
df_Y.insert(4, 'Delta-Y', Max_NR.MaxNR)
df_Y.head()

Unnamed: 0,TRIAL,STx,Y0,Ymax,Delta-Y
0,1,silty_clay,13.317,15.496,2.179
1,2,silty_clay,6.604,11.722,5.118
2,3,loamy_sand,9.088,13.15,4.062
3,4,silt_loam,9.742,11.473,1.731
4,5,loamy_sand,12.172,14.523,2.351


### **Next Steps:**
 - Exploring N responses to each fertilizer rate.
 - Estimating their corresponding efficiencies.

In [None]:
trials = sdf.groupby("TRIAL")
NResponse =
    for i in trials:
        if Nrate == 0:
            NR = NaN
            else
            NR = GY - Y0


In [80]:
print(type(trials))

<class 'pandas.core.groupby.groupby.DataFrameGroupBy'>


In [None]:
trial_1