Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cohort Characterization - denominators for subgroups #2420

Closed
alondhe opened this issue Dec 18, 2020 · 5 comments
Closed

Cohort Characterization - denominators for subgroups #2420

alondhe opened this issue Dec 18, 2020 · 5 comments

Comments

@alondhe
Copy link
Contributor

alondhe commented Dec 18, 2020

Expected behavior

When using subgroups in cohort characterization, the denominators used for percents should be for each subgroup.

Actual behavior

When using cohort event date ranges as subgroups (Jan-Apr, May-July, etc), the percents are using the total cohort as the denominator rather than the subgroup patients.

Steps to reproduce behavior

Create a cohort characterization, add some features, and add some subgroups.

@ssuvorov-fls
Copy link
Contributor

@alondhe
Do you mean that we should take only those events from observation_period which have observation_period_start_date >= start date in subgroup and observation_period_end_date <= emd date in subgroup

@ssuvorov-fls
Copy link
Contributor

@alondhe
And could you give the valid characterization with prepared subgroups?

@alondhe
Copy link
Contributor Author

alondhe commented Aug 5, 2021

@ssuvorov-fls what I'm saying is that the proportions (percents) listed when using subgroups should use the size of the subgroup as the denominator. So if we have a cohort of COVID patients, and we split into subgroups of January, February, and March, the percents in the subgroup columns should be the number of patients with the feature divided by the number of patients with index in that subgroup's month.

@ssuvorov-fls
Copy link
Contributor

Code for creating cohort events

INSERT
into
	qualified_events_17 (event_id,
	person_id,
	start_date,
	end_date,
	op_start_date,
	op_end_date)
select
	row_number() over (partition by E.subject_id
order by
	E.cohort_start_date) as event_id,
	E.subject_id as person_id,
	E.cohort_start_date as start_date,
	E.cohort_end_date as end_date,
	OP.observation_period_start_date as op_start_date,
	OP.observation_period_end_date as op_end_date
from
	results_atlas_280.temp_cohort_wbx8684h E
join cdm_531.observation_period OP on
	E.subject_id = OP.person_id
	and E.cohort_start_date >= OP.observation_period_start_date
	and E.cohort_start_date <= OP.observation_period_end_date
where
	cohort_definition_id = 77049;

Code for creating table of persons events for subgroup

select
	77049 as cohort_definition_id,
	17 as strata_id,
	q.person_id as subject_id,
	q.start_date as cohort_start_date,
	q.end_date as cohort_end_date
from
	qualified_events_17 q
join (
	select
		person_id
	from
		(
		-- Begin Criteria Group
		select
			0 as index_id,
			person_id,
			event_id
		from
			(
			select
				E.person_id,
				E.event_id
			from
				qualified_events_17 E
			inner join (
				-- Begin Correlated Criteria
				select
					0 as index_id,
					cc.person_id,
					cc.event_id
				from
					(
					select
						p.person_id,
						p.event_id
					from
						qualified_events_17 P
					join (
						-- Begin Condition Occurrence Criteria
						select
							C.person_id,
							C.condition_occurrence_id as event_id,
							C.condition_start_date as start_date,
							coalesce(C.condition_end_date, (C.condition_start_date + 1 * interval '1 day')) as end_date,
							C.visit_occurrence_id,
							C.condition_start_date as sort_date
						from
							(
							select
								co.*
							from
								cdm_531.CONDITION_OCCURRENCE co ) C
						where
							C.condition_start_date > TO_DATE(TO_CHAR(2009, '0000')|| '-' || TO_CHAR(1, '00')|| '-' || TO_CHAR(1, '00'), 'YYYY-MM-DD')
							and C.condition_end_date < TO_DATE(TO_CHAR(2009, '0000')|| '-' || TO_CHAR(12, '00')|| '-' || TO_CHAR(1, '00'), 'YYYY-MM-DD')
							-- End Condition Occurrence Criteria
) A on
						A.person_id = P.person_id
						and A.START_DATE >= P.OP_START_DATE
						and A.START_DATE <= P.OP_END_DATE
						and A.START_DATE >= P.OP_START_DATE
						and A.START_DATE <= P.OP_END_DATE
						) cc
				group by
					cc.person_id,
					cc.event_id
				having
					COUNT(cc.event_id) >= 1
					-- End Correlated Criteria
) CQ on
				E.person_id = CQ.person_id
				and E.event_id = CQ.event_id
			group by
				E.person_id,
				E.event_id
			having
				COUNT(index_id) = 1 ) G
			-- End Criteria Group
) st
	group by
		person_id) sti on
	sti.person_id = q.person_id;

@alondhe
Copy link
Contributor Author

alondhe commented Aug 31, 2021

Not seeing this issue after re-testing with v2.10.

@alondhe alondhe closed this as completed Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants