<a href="https://colab.research.google.com/github/SergeyHSE/LinearRegressor.github.io/blob/main/RegressionAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.diagnostic as dg
import scipy.stats
from scipy.stats import boxcox


When studying the literature on this topic, the most interesting articles were "Economic efficiency of beef cattle production in Thailand" by Professor Suneeporn Suwanmaneepong of King Mongkut's Institute of Technology Ladkrabang Faculty of Agricultural Technology PhD and "Assessment of technical efficiency and its determinants in beef cattle production in Kenya" by Eric Ruto of Lincoln University. In this paper, the professor describes the economic efficiency of livestock production. To build the model she uses the following variables as the most efficient ones: cost of feed and additives, equipment, drugs and labor, access to priority markets, etc. Unfortunately, our data do not contain information on the costs of purchasing veterinary drugs, so we will not be able to analyze their impact on the profitability of the enterprise. Therefore, we will do something else: we will deduct from the cost price all the cost items that we have. This will give us the amount including the costs of veterinary drugs.

Moreover, both authors conclude in the conclusions of their studies that there is a need for government intervention with different types of assistance such as:
- Improving farmers' access to the knowledge they need to develop their farms as well as their farming skills
- Providing access to more modern technologies
- Improving access to market services
- Creating opportunities for off-farm income generation.

All these factors are in one way or another related to government support, to a certain type of subsidy, which directly, according to the authors, should improve profit margins, and therefore improve the model's performance.


In [19]:
from google.colab import files
file = files.upload()

Saving agro_census.dta to agro_census.dta


In [24]:
data = pd.read_stata('agro_census.dta')
data.columns, data.shape

(Index(['NPPP', 'COD_COATO', 'KFS', 'KOPF', 'OKVED', 'land_total',
        'cost_milk_KRS', 'cost_KRS_food', 'cost_meat_KRS', 'AB_1', 'CF_1',
        'short_credit', 'long_credit', 'debit_debt', 'credit_debt',
        'gov_sup_plant', 'gov_sup_seed', 'gov_sup_grain', 'subs_plant',
        'subs_grain', 'gov_sup_farming', 'gov_sup_KRS', 'subs_prod_farm',
        'subs_milk', 'subs_meat', 'subs_KRS', 'subs_combikorm', 'sub_chemistry',
        'subs_fuel', 'farms_number', 'profit_farms_number',
        'unprofit_farms_number', 'capital', 'profit', 'unprofit', 'J', 'O',
        'empl_org', 'empl_prod', 'V', 'W', 'X', 'AN', 'AO', 'AP', 'AQ', 'AR',
        'BE', 'BF', 'BG', 'BQ', 'BR', 'BS', 'BT', 'BU', 'BY', 'BZ', 'CA',
        'salary_plant', 'salary_farm', 'DB', 'DC', 'DF', 'DG', 'DH', 'DI', 'DK',
        'DO', 'DT', 'EC', 'EG', 'EJ', 'EK', 'ER', 'ES', '_merge'],
       dtype='object'),
 (6287, 76))

In [26]:
df = data[(data['OKVED'] == '01.21')]
df.shape

(2595, 76)

In [27]:
df['net_profit'] = df['profit'] - df['unprofit']
df['rentabel'] = df['net_profit'] / df['BR']
df['other_cost'] = df['BQ'] - df['salary_farm'] - df['DC'] - df['DI']
df['subsidies'] = df['gov_sup_KRS'] + df['subs_prod_farm'] + df['subs_milk'] + df['subs_KRS'] + df['subs_combikorm'] + df['subs_fuel']
df['debt'] = df['credit_debt'] - df['debit_debt']
df['cost_services'] = df['J'] - df['BQ']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['net_profit'] = df['profit'] - df['unprofit']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['rentabel'] = df['net_profit'] / df['BR']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['other_cost'] = df['BQ'] - df['salary_farm'] - df['DC'] - df['DI']
A value is trying to be set on a copy of 

In [28]:
df.rename(columns={'DC' : 'amortization',
                   'DI' : 'social_cost'}, inplace=True)
df['output'] = df['AP'] + df['BE'] + df['BS']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.rename(columns={'DC' : 'amortization',
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['output'] = df['AP'] + df['BE'] + df['BS']


In [31]:
column_names = ['net_profit', 'rentabel', 'other_cost', 'subsidies', 'debt',
                'cost_services', 'amortization', 'output', 'salary_farm',
                'empl_org', 'KOPF', 'social_cost']
livestock = df[column_names]
livestock.shape

(2595, 12)

We ended up with the following variables:
net_profit - net profit of livestock production
rentabel - profitability (net profit to revenue ratio)
other_cost - costs, which include, among other things, costs for repayment of loans and for purchase of veterinary drugs.
social_cost - deductions for social needs
subsidies - total amount of subsidies, including subsidies for milk and meat production, fuel subsidies, etc.
debt - current short-term debts (difference between accounts payable and accounts receivable).
cost_services - costs of realization of services, works (difference between the cost of sold goods, products, works, services and the cost of sale of livestock products
amortization - amortization
output - gross output of milk, meat, cattle.
salary_farm - labor costs.
empl_org - Average annual number of employees of the agricultural organization
KOPF - (
42
Unitary enterprises, based on the right of economic management
47
Open joint stock companies
52
Production cooperatives
65
Limited liability companies
67
Closed joint-stock companies
54
Collective farms
55
State farms)

In [32]:
livestock.head()

Unnamed: 0,net_profit,rentabel,other_cost,subsidies,debt,cost_services,amortization,output,salary_farm,empl_org,KOPF,social_cost
0,8931.0,0.189771,30701.0,3892.0,26477.0,48314.0,2114.0,2693.0,8735.0,294,47,1147
1,2495.0,0.123149,13939.0,3710.0,11271.0,3138.0,582.0,1678.0,3827.0,166,52,490
2,98.0,0.003601,16900.0,4096.0,18952.0,7885.0,572.0,3043.0,6790.0,235,67,1014
3,-4868.0,-1.044411,6460.0,207.0,10718.0,2083.0,162.0,668.0,1432.0,95,52,182
6,-3457.0,-0.503129,7502.0,825.0,3060.0,8783.0,0.0,332.0,1645.0,136,67,210
