Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate R plotting scripts into Python and add get_version.py #17

Merged
merged 34 commits into from
Sep 20, 2022
Merged

Translate R plotting scripts into Python and add get_version.py #17

merged 34 commits into from
Sep 20, 2022

Conversation

YeChen-IDM
Copy link
Contributor

@YeChen-IDM YeChen-IDM commented Jun 9, 2022

5 scripts under review:

  • helpers_likelihood_and_metrics.py
  • helpers_plot_ref_sim_comparisons.py
  • helpers_coordinate_each_relationship.py
  • helpers_reformat_sim_ref_dfs.py
  • run_generate_validation_comparisons_site.py

We will need to resolve some functions before we can run and test the main script, like the ~lm(), arrange(), etc.

Here is one of the plot from python script(please note that the ref_year is not grouped in the last plot):
site_compare_incidence_age
Here is the original plot from R scripts:
site_compare_incidence_age

@YeChen-IDM YeChen-IDM changed the title Translate R scripts into Python Translate R plotting scripts into Python Jun 9, 2022
@@ -0,0 +1,64 @@
from plotnine import ggplot, aes, geom_line, geom_point, theme_bw, xlab, ylab, scale_color_manual, facet_wrap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Python script that translated from these archived R script, no need to review.

@@ -0,0 +1,144 @@
#####################################################################################
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Python script that translated from these archived R script, no need to review.

@@ -88,11 +88,11 @@ def plot_par_dens_ref_sim_comparison(age_agg_sim_df, ref_df):
# colors = brewer.pal(n=num_colors, name='BrBG')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Python script that translated from these archived R script, no need to review.

@YeChen-IDM
Copy link
Contributor Author

Error about conflict dependencies when installing our package:

ERROR: Cannot install idmtools-cli and malaria-model-validation because these package versions have conflicting dependencies.
The conflict is caused by:
    datar 0.0.0 depends on pandas<2.0 and >=1.2
    plotnine 0.1.0 depends on pandas>=0.19.0
    idmtools 1.6.6 depends on pandas<1.2 and >=1.1.4

Pandas 1.2 is released on Dec 2020, not sure if idmtools can remove the requirement for pandas < 1.2

@YeChen-IDM
Copy link
Contributor Author

Error about conflict dependencies when installing our package:

ERROR: Cannot install idmtools-cli and malaria-model-validation because these package versions have conflicting dependencies.
The conflict is caused by:
    datar 0.0.0 depends on pandas<2.0 and >=1.2
    plotnine 0.1.0 depends on pandas>=0.19.0
    idmtools 1.6.6 depends on pandas<1.2 and >=1.1.4

Pandas 1.2 is released on Dec 2020, not sure if idmtools can remove the requirement for pandas < 1.2

Clinton will remove this pandas<1.2 requirement in their upcoming release.

@YeChen-IDM YeChen-IDM changed the title Translate R plotting scripts into Python Translate R plotting scripts into Python and add get_version.py Jul 5, 2022
# Conflicts:
#	create_plots/helpers_coordinate_each_relationship.R
#	report/archive/Malaria_model_validation_output.pdf
#	report/archive/Malaria_model_validation_output_10-05-2022_13-12-33.pdf
#	requirements.txt
#	simulations/manifest.py
loglik_df_asex_bench = get_dens_loglikelihood(combined_df=combined_df_asex, sim_column='benchmark')
loglik_df_asex_bench.rename(columns={"loglikelihood": "benchmark_loglike_asex"}, inplace=True)
# todo: need review, this dataframe is created but not used. Should we use it in line 234(loglik_df = pd.merge(...))
loglikelihood_comparison = pd.merge(loglik_df_asex, loglik_df_asex_bench, how="outer")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MAmbrose-IDM, this dataframe is created but not used. Should we use it in line 234(loglik_df = pd.merge(...))

coord_csv[coord_csv['site'] == cur_site]['infectiousness_to_mosquitos_ref'].iloc[0])
ref_df_cur = pd.read_csv(filepath_ref)
# todo: local variable 'upper_ages' is assigned to but never used
upper_ages = sorted(sim_df_cur['agebin'].unique())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 errors need to be fixed

  • local variable 'upper_ages' is assigned to but never used,
  • undefined name 'sim_df_cur'

@YeChen-IDM
Copy link
Contributor Author

Update Python plotting script according to R script fix in #19, commit 1713364, see details in #22

mean_diff_df = mean_diff_df[mean_diff_df['change_abs_diff'].notnull()]
#todo: need code review in the following line:
# R code: mean_diff_df$abs_diff_changed = abs(mean_diff_df$change_abs_diff)/mean_diff_df$mean_abs_diff_bench > rel_change_threshold
mean_diff_df['abs_diff_changed'] = abs(mean_diff_df['change_abs_diff']) / mean_diff_df['mean_abs_diff_bench'] > rel_change_threshold
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need code review here

mean_diff_df['change_type'] = 'better'
mean_diff_df[mean_diff_df['change_abs_diff'] < 0]['change_type'] = 'worse'
mean_diff_df[~mean_diff_df['abs_diff_changed']]['change_type'] = 'similar'

Copy link
Collaborator

@MAmbrose-IDM MAmbrose-IDM Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran a quick test, it looked like ['change_type'] should come before [mean_diff_df['change_abs_diff'] or [~mean_diff_df['abs_diff_changed']] for this to work:

`mean_diff_df['change_type'][mean_diff_df['change_abs_diff'] < 0] = 'worse'

mean_diff_df['change_type'][~mean_diff_df['abs_diff_changed']] = 'similar'`

MAmbrose-IDM and others added 5 commits September 8, 2022 13:00
Use python stats.lingregress() function in a for loop in place of the nest() approach used in R for calculating linear regression / correlation information between the reference and simulation values. Note: edited section was tested separately, but still needs to be tested in context.
@YeChen-IDM YeChen-IDM mentioned this pull request Sep 20, 2022
@YeChen-IDM
Copy link
Contributor Author

Results in Python(left) vs. results in R(right):
image

@YeChen-IDM YeChen-IDM merged commit cbb7305 into InstituteforDiseaseModeling:main Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants