Title

Tools for Conditional Probability

Description

Allow users to calculate conditional probabilities across defined ranges in any numeric data frame. Unlike general conditional probability packages that require numerical data to be converted into categorical data, this toolkit directly handles numerical data in various ranges, enabling the calculation of conditional probabilities without conversion. For example, I have a dataset named df that includes two numerical columns: sleep_hour and age. To find P( sleep_hour >= 8.5 l age), we typically need to categorize age into groups, such as "old," "middle-aged," and "young," in order to utilize the built-in functions in R. This package simplifies our task by handling this categorization internally. We can use the function calc_cond_prob(df, "sleep_hour >= 8.5 ~ age", range_list=list(3)) to obtain the result.

Installation

You can install rangecondprob from CRAN:

install.packages("rangecondprob")

Example

Sample Data

Assume your data frame is named df

df <- data.frame(
  exam_math_score = c(85, 78, 90, 92, 70, 88, 95),
  exam_lang_score = c(80, 88, 85, 82, 77, 68, 55),
  age = c(16, 17, 18, 19, 16, 17, 18),
  height = c(150, 160, 165, 170, 155, 158, 172),
  weight = c(45, 60, 62, 67, 50, 55, 68),
  income = c(3000, 3200, 3500, 4000, 2600, 3100, 3900)
)

Find P(exam_lang_score ≥ 80 | age) (QUICK)

We find P(exam_lang_score ≥ 80 | age) in which age is split into three groups. Note that the return is a list which includes the output consists of a list containing [1] the results of the calculations, [2] a dataFrame of high and low odds extracted from the results, and [3] a range list used for the calculation. Note that outliners are removed.

calc_cond_prob(df, "exam_lang_score >= 80 ~ age ", range_list=list(3))

Sample Result

age	hit	total	odd
16:17	1	2	0.5
17:18	1	2	0.5
18:19	1	2	0.5

Find P(exam_lang_score ≥ 80 | age) (SPECIFIC RANGE 1)

We find P(exam_lang_score ≥ 80 | age) in which each age group is defined by specific breaks. Note that the return is a list.

calc_cond_prob(df, "exam_lang_score >= 80 ~ age ", range_list=list(c(16,16.5,17.5,19,19.5)))

Sample Result

age	hit	total	odd
16:16.5	1	2	0.5
16.5:17.5	1	2	0.5
17.5:18.5	1	2	0.5
18.5:19.5	1	1	1.0

Find P(exam_lang_score ≥ 80 | age) (SPECIFIC RANGE 2)

We find P(exam_lang_score ≥ 80 | age) focusing on two specific age groups: 16<=age<16.5 and 18.5<=age<19.5 In this case, we will use a list to include the ranges as list(c(15,16.5), c(18,195.))

calc_cond_prob(df, "exam_lang_score >= 80 ~ age" , range_list=list( list(c(16,16.5), c(18,19.5))) )

age	hit	total	odd
16:16.5	1	2	0.5
18.5:19.5	1	1	1.0

Find P(exam_lang_score ≥ 80 | age and height and weight and income)

We find P(exam_lang_score ≥ 80 | age and height and weight and income), where their groups are split into 3,4,4,4 groups, respectively.

calc_cond_prob(df, "exam_lang_score >= 80 ~ age + height + weight + income", range_list=list( 3,4,4,4))

age	height	weight	income	hit	total	odd
16:17	150:156.48	45:52.49	2600:3049.7	1	2	0.5
17:18	156.5:159.98	52.5:59.99	3050:3199.68	0	1	0.0
17:18	160:167.48	60:64.49	3200:3699.63	1	1	1.0
18:19	160:167.48	60:64.49	3200:3699.63	1	1	1.0

Conduct a further analysis

Conduct a further analysis of the probabilities across all combinations of age and height.
Below is for P(exam_lang_score ≥ 80 | age) , P(exam_lang_score ≥ 80 | height) and P(exam_lang_score ≥ 80 | age and height)

res=calc_cond_prob(df, "exam_lang_score >= 80 ~ age + height + weight + income", range_list=list( 3,4,4,4))
shortSummary(res[[1]], "age + height ", combination=1)

RESULT 1

age	hit	total	odd
16:17	1	2	0.5
17:18	1	2	0.5
18:19	1	1	1.0

Attention: The result P(exam_lang_score ≥ 80 | age) is different from the one of calc_cond_prob(df, "exam_lang_score >= 80 ~ age ", range_list=list(3)) because it is derived from the result of calc_cond_prob(df, "exam_lang_score >= 80 ~ age + height + weight + income", range_list=list( 3,4,4,4)).

RESULT 2

height	hit	total	odd
150:156.48	1	2	0.5
156.5:159.98	0	2	0.5
160:167.48	2	1	1.0

RESULT 3

age	height	hit	total	odd
16:17	150:156.48	1	2	0.5
17:18	156.5:159.98	0	1	0.0
17:18	160:167.48	1	1	1.0
18:19	160:167.48	1	1	1.0

Filter out the result

Utilize the goodchance function to filter for values that fall within the specified range

res=calc_cond_prob(df, "exam_lang_score >= 80 ~ age + height + weight + income", range_list=list( 3,4,4,4))
summary_result_list=shortSummary(res[[1]], "age + height ", combination=1)
lapply(summary_result_list, goodchance, upper=0.7, lower=0.25)

RESULT 1

age	hit	total	odd
18:19	1	1	1.0

RESULT 2

height	hit	total	odd
160:167.48	2	1	1.0

RESULT 3

age	height	hit	total	odd
17:18	160:167.48	1	1	1.0
18:19	160:167.48	1	1	1.0

Advanced Use

calc_cond_prob(df, formula_string="exam_lang_score >= 80 | exam_math_score >= 80 ~ age + income", range_list=list(3,4))

Licence

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details: https://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
R		R
man		man
DESCRIPTION		DESCRIPTION
MD5		MD5
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Title

Description

Installation

Example

Sample Data

Find P(exam_lang_score ≥ 80 | age) (QUICK)

Find P(exam_lang_score ≥ 80 | age) (SPECIFIC RANGE 1)

Find P(exam_lang_score ≥ 80 | age) (SPECIFIC RANGE 2)

Find P(exam_lang_score ≥ 80 | age and height and weight and income)

Conduct a further analysis

Filter out the result

Advanced Use

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

cran/rangecondprob

Folders and files

Latest commit

History

Repository files navigation

Title

Description

Installation

Example

Sample Data

Find P(exam_lang_score ≥ 80 | age) (QUICK)

Find P(exam_lang_score ≥ 80 | age) (SPECIFIC RANGE 1)

Find P(exam_lang_score ≥ 80 | age) (SPECIFIC RANGE 2)

Find P(exam_lang_score ≥ 80 | age and height and weight and income)

Conduct a further analysis

Filter out the result

Advanced Use

Licence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages