Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data initialization and validation tracking issue #2

Open
12 of 33 tasks
EwoutH opened this issue May 24, 2023 · 28 comments
Open
12 of 33 tasks

Data initialization and validation tracking issue #2

EwoutH opened this issue May 24, 2023 · 28 comments

Comments

@EwoutH
Copy link
Owner

EwoutH commented May 24, 2023

This is a tracking issue for all the data we might need, for either initialization or validation.

@nit1995 Thanks for taking this on! Can you take the lead on the Drought and Irrigation parts? Please discuss these things also directly with Kaveri (since she is the subject matter expert).

See also the pseudocode: https://docs.google.com/document/d/1Fay3uEJRzJnAenQnd0Pa9e5AOzRPa5O7vOzR0pKnI5M/edit

Farming

  • Drought
    • How can we define drought? (rain deficit in some sequential months?)
    • Can we find a correlation/relation between drought and yield? (from this data or other)
  • Crops (notebook)
    • How much of each crops is produced? [kg]
    • How much area is used for each crop? [ha/kg]
    • How much is each crop worth? [Rs/kg]
    • How much does is costs to farm each crop (seeds, fertilizer, labour)? [Rs/ha]
    • When sowing a new type of crop, how many years does it take to full yield (if any)? [% curve to 100%)
    • What initial investments costs need to be made for a new crop [Rs/ha]
  • Irrigation
    • Which types of irrigation can be used with which crops [yes/no lookup table]
    • How much does each type of irrigation [Rs/ha]
    • Can multiple types of irrigation be present on the same farm land? If so, which can and which can not?
    • Does having irrigation take any space that can't be used to produce crops?
    • Does irrigation have a maximum capacity, or can it handle any drought?
  • Farm land
    • How much farm land do farmers have? [distribution function in ha]
    • How much of it is already irrigated, with which types?

Financial

  • Income
    • How much savings do the average farming household have?
    • How much yearly living costs does the average farming household have?
  • Investments
    • How much do different types of irrigation costs (per ha)
    • Do we want any other investments? Additional farmland?
  • Borrowing / lending
    • From who can farmers borrow?
    • How many year will loans? Are they paid of periodically?

Network

  • On which homophily characteristics is the network initialized?
  • What's the range/distribution of the number of neighbours?
  • Do we have neighbours and Joint Liability Groups?
  • Do we want the network to update over time?
  • Do we want some sort of trust/liability (if so, how?)

conceptual-mindmap-India-drought-2023-05-22.zip

@Kaveri3012
Copy link

Thanks a lot, Ewout. Working with Niteesh on this.

@nit1995
Copy link
Collaborator

nit1995 commented May 30, 2023

Can multiple types of irrigation be present on the same farm land? If so, which can and which can not?
Yes. Different crops have different irrigation needs, and if the farmer is diversifying his crop, it is possible to have multiple types of irrigation on the same farm land.

Does having irrigation take any space that can't be used to produce crops?
Unless the farmer builds farm ponds, the space taken by irrigation is negligible and can be ignored.

Does irrigation have a maximum capacity, or can it handle any drought?
Irrigation systems depend on a reliable water source, which can include rivers, reservoirs, wells, or aquifers. However, during an intense drought, these sources might not yield sufficient water to satisfy the requirements of irrigation. While borewells generally continue to provide water during drought conditions, their effectiveness can be significantly reduced during extended periods of drought.

https://www.deccanherald.com/state/cdurga-villages-answer-drought-713763.html
https://www.livemint.com/Politics/CF05j4ycqjUmMI4qDQ8IzI/Running-out-of-water-droughthit-Karnataka-to-rent-private.html

@EwoutH
Copy link
Owner Author

EwoutH commented May 30, 2023

@nit1995 Thanks a lot for doing the research on this!

Yes. Different crops have different irrigation needs, and if the farmer is diversifying his crop, it is possible to have multiple types of irrigation on the same farm land.

Can you make an yes/no table which of these crops can use which of these irrigation types? ['Maize', 'Pigeonpea', 'Sorghum', 'Chickpea', 'Groundnut', 'Finger millet']

Unless the farmer builds farm ponds, the space taken by irrigation is negligible and can be ignored.

Perfect, let’s assume irrigation doesn’t take significant space.

Irrigation systems depend on a reliable water source, which can include rivers, reservoirs, wells, or aquifers. However, during an intense drought, these sources might not yield sufficient water to satisfy the requirements of irrigation. While borewells generally continue to provide water during drought conditions, their effectiveness can be significantly reduced during extended periods of drought.

@Kaveri3012 and @nit1995 can you discuss how this translates to yield? I think we could quantify it as precipitation deficit, of which each irrigation system can handle some amount. Then we need a functions:

  • How often how large precipitation deficits (droughts) happen
  • Which irrigation type can handle what amount
  • How the remaining precipitation deficit after irrigation translates to a reduction in yield (for each crop?)

@nit1995
Copy link
Collaborator

nit1995 commented May 31, 2023

Can you make an yes/no table which of these crops can use which of these irrigation types? ['Maize', 'Pigeonpea', 'Sorghum', 'Chickpea', 'Groundnut', 'Finger millet']

I've added a lookup table for these crops. However, @Kaveri3012 and I are looking at better defining the geographical scope based on the sample size of the respondents belonging to these districts in the CMIE data. Based on the districts chosen, the major crops might change. I will update it soon.

can you discuss how this translates to yield?

Using ICRISAT data to run a regression is tricky because it doesn't control for the type of irrigation or any other factors that might influence yield.

I have been looking at literature to see if I can find some other estimate. This paper shows strong correlation between rainfall from June to September and yield for different crops. Another paper however show that exposure to extreme temperatures impact yield more than rainfall. And that impact of rainfall is more significant (for rice grown) in rainfed than in irrigated conditions. If I don't find any secondary literature on these crops, I might have to use ICRISAT data itself.

Which irrigation type can handle what amount

This is tough to quantify. Borewells generally provide respite during droughts, but how long they run depends on groundwater levels and also what type of irrigation the farmers use. Irrigation types using water from canals again depend on if the rivers are flowing and enough water has been released into the canal.

@Kaveri3012
Copy link

@nit1995 Could you upload the data on the look up tables (and data on farm-land etc ) by the end of today? We already discussed this two days ago. The more data we have sooner, the better, so @EwoutH can build a better model to start with.

@EwoutH the link between rainfall deficit and drought seems more complicated to quantify. But we'll try and have something by the end of this week.

@Kaveri3012
Copy link

Kaveri3012 commented Jun 1, 2023

@nit1995 please also list out which of the five districts we will be specifically looking at in Karnataka, and the reasons for choosing those districts, and any related limitations of making that choice.

We will then have to find both demography-related (income, consumption trends) and crop-related data, across the three farmer groups for those five districts, and send them to Ewout

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 1, 2023

Great work, it’s appreciated!

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 4, 2023

@nit1995 Could you try to deliver all data as tidy data? That looks like this:

The 3 rules of Tidy Data

1_jS6ldw3qCLWA4m5aU6kn-Q

  1. Each variable is a column
  2. Each observation is a row
  3. Each type of observational unit is a table

For example, for the Farmland data, that would go from this

Year MARGINAL NUMBER (1000 Number) MARGINAL AREA (1000 ha) SMALL NUMBER (1000 Number) SMALL AREA (1000 ha) SEMI MEDIUM NUMBER (1000 Number) SEMI MEDIUM AREA (1000 ha) MEDIUM NUMBER (1000 Number) MEDIUM AREA (1000 ha) LARGE NUMBER (1000 Number) LARGE AREA (1000 ha) TOTAL NUMBER (1000 Number) TOTAL AREA (1000 ha)
2005 410.36 219.60000000000002 368.65 528.98 250.65 679.01 113.41 654.25 16.25 234.2 1159.3 2316.0
2010 462.99 251.23999999999998 398.2 565.31 250.05 669.3199999999999 104.88 595.01 14.11 198.31 1230.22 2279.19

to this:

Number Area Area per farmer
Marginal 462990.0 251240.0 0.542647
Small 398200.0 565310.0 1.419663
Semi medium 250050.0 669320.0 2.676745
Medium 104880.0 595010.0 5.673246
Large 14110.0 198310.0 14.054571
Total 1230220.0 2279190.0 1.852669

I now did this in this notebook.

If you have more than two axis (for example if you also would like to keep the different years, having 1) the year, 2) farmer size and 3) attribute as axis), please use multi-indexing. In that case, also feel free to save as a Pickle (DataFrame.to_pickle) instead of a CSV.

I hope this is possible, if you have any questions please let me know!

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 4, 2023

@Kaveri3012 I now just assume the Area per farmer value +- 25% for each farmer type. If you would like another approach please let me know.

Edit: Are there any other properties we like to link to farmer type? Like initial wealth or living costs?

@nit1995
Copy link
Collaborator

nit1995 commented Jun 5, 2023

@Kaveri3012 I now just assume the Area per farmer value +- 25% for each farmer type. If you would like another approach please let me know.

@EwoutH the classification of farmers based on land holdings is as follows:
Marginal: < 1 ha
Small: 1-2 ha
Semi-medium: 2-4 ha
Medium: 4-10 ha
Large: >10 ha

Source: Agricultural Census 2015-16

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 5, 2023

Thanks! Can you find/calculate an distribution function or histogram (bins) by any chance?

@nit1995
Copy link
Collaborator

nit1995 commented Jun 5, 2023

Thanks! Can you find/calculate an distribution function or histogram (bins) by any chance?

I am unable to find a distribution function or a histogram with bins. The data available only gives the total number of farmers and the aggregate area under each category of farmers. I plotted a histogram with this data, but the classification of land holdings does not provide equal intervals.

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 6, 2023

I did some experimentation and found out that the farm sizes quite closely resemble a lognormal distribution!

After a bit of experimentation I landed on a lognormal distribution with shape=0.92 and scale=1.25. It results in this distribution:

test

And results in the following metrics for each classification bin:

Number Area Area per farmer
Marginal 497476 287813.175517 0.578547
Small 357956 513225.280386 1.433766
Semi medium 248166 689381.443650 2.777904
Medium 111991 645684.047012 5.765499
Large 14631 211130.155641 14.430330

Which quite well resembles the table above!

Once we have the farm size data per district, we can fit a lognormal distribution for district by estimating the shape and scale parameters.

See the 4_India-ABM-farmland-size-distribution-function.ipynb

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 7, 2023

@nit1995 please let me know if anything is unclear, you want to discuss something or you’re stuck on something!

@nit1995
Copy link
Collaborator

nit1995 commented Jun 7, 2023

@EwoutH I had the district-wise farm size data in a format similar to this:

Number Area Area per farmer
Marginal 462990.0 251240.0 0.542647
Small 398200.0 565310.0 1.419663
Semi medium 250050.0 669320.0 2.676745
Medium 104880.0 595010.0 5.673246
Large 14110.0 198310.0 14.054571
Total 1230220.0 2279190.0 1.852669

I was not sure how to estimate the distribution though

@Kaveri3012
Copy link

Hi @EwoutH,

A couple of clarifications from Niteesh and me:

  1. We don't have data on sizes of individual farms for K'taka, but only have the aggregates based on the ICRISAT website... Niteesh and I both, therefore, aren't quite sure how we can go from the data table above to the parameters of the long-normal distribution that you have defined as below:

Define the shape of the distribution# The parameters can be adjusted based on the characteristics of your specific datashape, loc, scale = 0.92, 0, 1.25

  1. As a work around, we are trying to find empirical papers which have density plots for farm area sizes in India or Karnataka (or elsewhere); if we can find good papers, we can potentially use that data/insight to estimate parameters for the log normal distribution. Does that sound okay?

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 8, 2023

Thanks, I can take care of the parameter estimation. It's nothing more than playing around a bit and check if the error values go down.

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 14, 2023

@nit1995 Any chance you can price data on 'Castor', 'Linseed', 'Pearl millet' and 'Wheat'? Because currently we only have pricing data on Chickpea, Finger millet, Groundnut, Maize, Paddy, Pigeonpea and Sorghum.

Otherwise we will have six crops in rotation: Chickpea, Finger millet, Groundnut, Maize, Pigeonpea and Sorghum (no Paddy, because no area data).

If so, please add them to the CSVs without changing the data structure.

If not, also no problem, because those 6 listed above are also the most grown by area in Karnataka.

@Kaveri3012
Copy link

@nit1995 @EwoutH

  1. @nit1995 paddy is a very important crop -- let's try to find area data on it!
  2. can you clarify what percentage of all production these six crops cover @nit1995 ? If it is a substantial share (>50%), I think we can limit ourselves to the six crops... I doubt that we will uncover something grand and new in drought inequality dynamics with additional crops. I am concerned about leaving out paddy though.

best,
Kaveri

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 14, 2023

2. can you clarify what percentage of all production these six crops cover @nit1995 ? If it is a substantial share (>50%), I think we can limit ourselves to the six crops...

Without considering paddy, these six cover about 95% of area. Without paddy, no idea.

Edit: I just noticed, we do have area data for rice, but not paddy, and do have price data for paddy, not rice. Probably we can just say rice = paddy, and all our problems are solved.

EwoutH added a commit that referenced this issue Jun 14, 2023
We do have area data for rice, but not paddy, and do have price data for paddy, not rice. Probably we can just say rice = paddy, and all our problems are solved.

See #2 (comment)
EwoutH added a commit that referenced this issue Jun 14, 2023
We do have area data for rice, but not paddy, and do have price data for paddy, not rice. Probably we can just say rice = paddy, and all our problems are solved.

See #2 (comment)
@EwoutH
Copy link
Owner Author

EwoutH commented Jun 14, 2023

Paddy = rice solved a lot of problems, we now have 7 crops!

@Kaveri3012 I was thinking about how farmers estimate the expected return of switching crops. My initial idea is taking the market price of the past 5 years for their current crop and the crop they want to switch those, and comparing those.

@Kaveri3012
Copy link

Kaveri3012 commented Jun 14, 2023 via email

@Kaveri3012
Copy link

Kaveri3012 commented Jun 14, 2023 via email

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 16, 2023

Changes in code, since Tuesday:

  • Rice/Paddy added to crop mix (17% area share), now 7 crops
  • Model generates dynamic crop prices, based on data Niteesh found
  • Farmsizes and classes are drawn on initialisation from lognormal distribution

New assumptions:

JLG:

  • Select 40%, in each district, up to medium farm size
  • Select 10-14 from same districut and farm type
  • Same income, location (clustering) maybe some randomness

Neigbours:

  • Neigbours (friends and family): Tune number of neighbours until 30% can borrow

Lending:

  • 30% lenders determined: Every year
  • Keep list of untrusted lenders (not payed back in time)

Yield function:

  • Still worked on.

Crop diversification:

  • If more than 1/3 of neighbours has that crop, consider, per share:
    • Calculate NPV (5 year average), for next year
    • Replace at most one share a year

Expenditure:

  • Fit function based on income

Normalize for inflation:

  • Interest rate
  • Discount rates

Farmers

  • Do not borrow money to invest

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 18, 2023

@nit1995 For initialisation (and validation), I need to know how much different crops (i.e. 30% has 1, 50% has 2, 20% has 3) a farmer on average has. This can be a lookup or probability function. It might depend on the farm size/class. Do you think you can find some data on that?

@nit1995
Copy link
Collaborator

nit1995 commented Jun 19, 2023

@EwoutH I can't seem to get any data on how many farmers do multiple cropping or on how many crops they sow. I just found this statistic , but this is not India specific.

Only 5% of global rainfed cropland is under multiple cropping, whereas 40% of global irrigated cropland is under multiple cropping

@EwoutH
Copy link
Owner Author

EwoutH commented Jun 20, 2023

Thanks for looking anyway.

I also need a formula for the maximum amount that (a member of) a joint liability group can finance.

It would also be nice to have an indication of the typical duration of loans.

@Kaveri3012 in the pseudocode I encountered both notes of that a JGL can loan from a bank and microfinance institutions. Is it both, or just one of the two?

Edit: Also need to know how to translate income to an amount to lend at a nationalised bank. Last years income, or do you need to show a trend or something? 5 year average or minimum? Simple regression?

@Kaveri3012
Copy link

Hi Ewout,

  1. let's assume that JGL can only loan from a microfinance institution and not from a bank. (this is an accurate assumption, sorry about the confusion in the notes)
  2. for the income, let us assume last three years' average income

Best,
Kaveri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants