Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charts and Graphs #14

Open
abigailmondin opened this issue Nov 17, 2023 · 31 comments
Open

Charts and Graphs #14

abigailmondin opened this issue Nov 17, 2023 · 31 comments

Comments

@abigailmondin
Copy link
Contributor

Here we want to upload any charts or graphs we create (ex. pie charts, histograms, etc.). You may want to include a description, brief analysis, or explain the importance of the chart/graph.

@ltsippel
Copy link
Contributor


Rate memory pd101 pie Graph

Here is the Rate Memory Pie Chart for variable pd101 I am still figuring it out on how to label the legend with words not numbers.
I used graph pie, over(pd101) plabel(_all percent)

@kbuzard
Copy link
Contributor

kbuzard commented Nov 22, 2023

@ecn310/diop A pie chart is good for understanding each variable on its own. At this point, you need to start using stats / visuals that link together the key variables in your hypothesis. Dylan or I are happy to help brainstorm this if you're not sure how to get started.

@abigailmondin
Copy link
Contributor Author

pz262 (pw dementia) over pz216 (r years of education)

pz262_over_pz216
Code: graph bar (count) pz262 if pz262 == 1, over (pz216)

Description: This bar graph shows the number of respondents who answered "yes" to having dementia over the number of years of education they received. I think it makes sense to see a spike at 12 years because 12 years of schooling would mean the person made it through elementary, middle school, and high school. According to this particular group of respondents there doesn't seem to be a direct correlation between the number of years of schooling someone received and the person developing dementia.

@abigailmondin
Copy link
Contributor Author

abigailmondin commented Nov 30, 2023

Frequency Histogram of pz216 (r years of education)

frequency_histogram_pz216
Code: histogram pz216, frequency

Description: This is a frequency histogram that displays pz216 (r years of education). I believe this will be helpful to visually demonstrate the range of years of schooling that the respondents of our data have received. This will also help to put into context the way the bar graph above spikes at 12 years of schooling.

Updated x-axis labels

frequency_histogram_pz216
@kbuzard I've updated the graph to label each bar along the x-axis, which I did using the graph editor in Stata. It still seems kind of weird to me because the ticks don't line up exactly with the bars.

@abigailmondin
Copy link
Contributor Author

pz261 (pw Alzheimer's) over pz216 (r years of education)

pz261_over_pz216
Code: graph bar (count) pz261 if pz261 == 1, over (pz216)

Description: This is a bar graph similar to the dementia bar graph created above. This graph shows the respondents who answered "yes" to having Alzheimer's over the number of years of schooling they received. Similarly to the dementia graph, this bar graph also has a major spike at 12 years of education, which I believe makes sense considering the frequency histogram of pz216.

@kbuzard
Copy link
Contributor

kbuzard commented Dec 1, 2023

@abigailmondin You might want to consider making a graph that has two bars for each category: one for the people with dementia and one for the people without. I think you'd take out the "if" statement and add another "over( )" for the pz261 variable, but I'm not 100% on that.

@kbuzard
Copy link
Contributor

kbuzard commented Dec 1, 2023

@abigailmondin Here's some code that I think should help (it's from ChatGPT, so buyer beware):

// Step 1: Tabulate the categorical variable
tabulate pz262, gen(cat_count)

// Step 2: Calculate percentages
egen cat_percent = total(cat_count) / count if !missing(pz262), by(pz262)

// Step 3: Create a bar chart
graph bar (asis) cat_percent, over(pz262) ///
  title("Percentage Distribution by Category") ///
  ytitle("Percentage") ///
  bar(1, color(blue)) ///
  legend(off)

@abigailmondin
Copy link
Contributor Author

abigailmondin commented Dec 5, 2023

pd554 (get lost in familiar places) over pz216 (r years of education)

pd554_over_pz216
Code:
generate pd554_yes = 0
replace pd554_yes = 1 if pd554 == 1
generate pd554_no = 0
replace pd554_no = 1 if pd554 == 5
graph bar pd554_yes pd554_no, over (pz216)

Description: This bar graph shows pd554 (get lost in familiar places) separated into those who responded "yes" (this is labeled "pd554_yes") and those who responded "no" (this is labeled "pd554_no") over pz216 (r years of education). Overall, pd554_no is much higher than pd554_yes at each year of education. But there does not seem to be a direct link between years of education and getting lost in familiar places.

  • Update: Now that I've fixed the code thanks to Professor Buzard, this is the new graph with proper code. It still seems as though there is no specific pattern between years of education and getting lost in familiar places. However, the difference between "yes" responses and "no" responses is not as drastic as I initially thought.

@kbuzard
Copy link
Contributor

kbuzard commented Dec 6, 2023

@abigailmondin I think you need replace pd554_no = 1 if pd554 == 5. I'm pretty sure the red bars are about five times higher than the blue bars because of this.

@abigailmondin
Copy link
Contributor Author

@kbuzard Thank you for catching that! I believe you were correct about the red bars being five times higher than the blue bars. I've corrected the code, created and attached the correct graph, and added an update to the description.

@abigailmondin
Copy link
Contributor Author

abigailmondin commented Dec 6, 2023

pd554 (get lost in familiar places) over pc273 (ever had dementia)

pd554_over_pc273
Code: using the same generate statements as the pd554 over pz216 bar graph
graph bar pd554_yes pd554_no, over (pc273)

Description: Similar to the previous graph, this is a bar graph that shows pd554 (get lost in familiar places) separated into those who responded "yes" and those who responded "no" over pz273 (ever had dementia). The biggest spike seen on the graph is those who responded "yes" to both ever having dementia and "yes" to getting lost in familiar places. I included the codebook for pc273 to help understand what the values 1, 3, 4, 5, 8, and 9 across the bottom of the graph are referring to.

  • Codebook for pc273:
    • 1 = Yes
    • 3 = Disputes previous wave record, but now has condition
    • 4 = Disputes previous wave record, does not have condition
    • 5 = No
    • 8 = DK (don't know), NA (not ascertained)
    • 9 = Refused

Potential fix (based on feedback)

@kbuzard Are these the kind of changes you were suggesting we make to the graphs? I used the graph editor to make these changes because I really struggled to find code that would do the kind of thing we discussed. If this isn't what you were thinking, would you be able to help me find a way to accomplish what the graph should ideally look like?

updated_pd554_over_pc273

@abigailmondin
Copy link
Contributor Author

pv009 (forgetful during daily activities) over pz216 (r years of education)

pv009_over_pz216
Code:
generate pv009_yes = 0
replace pv009_yes = 1 if pv009 == 1
generate pv009_no = 0
replace pv009_no = 1 if pv009 == 5
graph bar pv009_yes pv009_no, over (pz216)

Description: This is a bar graph that shows pv009 (forgetful during daily activities) separated into those who responded "yes" and those who responded "no" over pz216 (r years of education). In this particular graph, there seems to be a relatively steady increase in those who responded "no" as the number of years of schooling increased. This correlates with our hypothesis that the more schooling a person gets the less likely they are to develop dementia as the increased amount of schooling stimulates your brain and increases cognitive health.

@abigailmondin
Copy link
Contributor Author

pv009 (forgetful during daily activities) over pc273 (ever had dementia)

pv009_over_pc273
Code: using the same generate statements as the pv009 over pz216 bar graph
graph bar pv009_yes pv009_no, over (pc273)

Description: Similar to the previous graph, this bar graph shows pv009 (forgetful during daily activities) separated into those who responded "yes" and those who responded "no" over pc273 (ever had dementia). The largest spike in this graph is those who responded "no" to being forgetful during daily activities and "no" to ever having dementia. There is also a significant spike at those who responded "yes" to being forgetful during daily activities and "yes" to ever having dementia. I included the codebook again for the variable pc273 to help understand the values 1, 3, 4, 5, 8, and 9 across the bottom of the graph.

  • Codebook for pc273:
    • 1 = Yes
    • 3 = Disputes previous wave record, but now has condition
    • 4 = Disputes previous wave record, does not have condition
    • 5 = No
    • 8 = DK (don't know), NA (not ascertained)
    • 9 = Refused

@sophiehaber
Copy link
Contributor

Ever had dementia vs college degree
Code: tabulate pc273 pb016, row col
Ever had dementia vs high school diploma
Code: tabulate pc273 pb015, row col

Code for exporting tables:

  1. ssc install outreg2 (installs "outreg" package)
  2. outreg2 ____ ____using ctab.doc, replace cross noaster

@xorabear
Copy link
Contributor

xorabear commented Dec 12, 2023

pc272 ever had alzheimers over pz216 years of education

image

code /*
pz216 years of education
pc272 ever had alzheimers
pb014 highest level of education
if response is 1 or 3
if response is 4 or 5
*/
C:\Users\aartis\OneDrive - Syracuse University\Documents\GitHub\course-project-diop

generate pc272_yes = 0
replace pc272_yes = 1 if(pc272 == 1| pc272 ==3)
generate pc272_no = 0
replace pc272_no = 1 if(pc272 == 4| pc272 ==5)
graph bar pc272_yes pc272_no, over (pz216)

@xorabear
Copy link
Contributor

xorabear commented Dec 12, 2023

pc272 ever had alzheimers over pb014 highest level of education

image

Code
/*
pz216 years of education
pc272 ever had alzheimers
pb014 highest level of education
if response is 1 or 3
if response is 4 or 5
*/
C:\Users\aartis\OneDrive - Syracuse University\Documents\GitHub\course-project-diop

generate pc272_yes = 0
replace pc272_yes = 1 if(pc272 == 1| pc272 ==3)
generate pc272_no = 0
replace pc272_no = 1 if(pc272 == 4| pc272 ==5)
graph bar pc272_yes pc272_no, over (pb014)

scale for responses
0. For no formal education
1-11. Grades
12. High school
13-15. Some college
16. College grad
18. Post college (17+ years)
19. 97. Other
98. DK (Don't Know); NA (Not Ascertained)
99. RF (Refused)
I'm working on editing these out so includes less people in the results

@abigailmondin
Copy link
Contributor Author

Updated graphs (still not 100% perfect)

pd554 (get lost in familiar places) over pz216 (r years of education)

updated_pd554_over_pz216
Code: graph bar pd554_yes pd554_no, over (pz216) percent stack legend(position(12) rows(2) label(1 "respondents who get lost in familiar places") label(2 "respondents who don't get lost in familiar places")) blabel(total, format(%9.0f))

pv009 (forgetful during daily activities) over pz216 (r years of education)

updated_pv009_over_pz216
Code: graph bar pv009_yes pv009_no, over (pz216) percent stack legend(position(12) rows(2) label(1 "forgetful during daily activities") label(2 "not forgetful during daily activities")) blabel(total, format(%9.0f))

pd554 (get lost in familiar places) over pc273 (ever had dementia)

updated_pd554_over_pc273
Code: graph bar pd554_yes pd554_no, over (pc273, label(angle(45)) relabel(1 "Yes" 2 "Now has condition" 3 "Now doesn't have condition" 4 "No" 5 "Don't know" 6 "Refused")) percent stack legend(position(12) rows(2) label(1 "respondents who get lost in familiar places") label(2 "respondents who don't get lost in familiar places")) blabel(total)

pv009 (forgetful during daily activities) over pc273 (ever had dementia)

updated_pv009_over_pc273
Code: graph bar pv009_yes pv009_no, over (pc273, label(angle(45)) relabel(1 "Yes" 2 "Now has condition" 3 "Now doesn't have condition" 4 "No" 5 "Don't know" 6 "Refused")) percent stack legend(position(12) rows(2) label(1 "forgetful during daily activities") label(2 "not forgetful during daily activities")) blabel(total)

Overall update: I developed the code for the changes that I had made using the graph editor.

@kbuzard or @eldreddyl
Things I still need help coding:

  • Adding a label for the x axis on all four graphs, when I try using xtitle it gives me an error message
  • Getting the x axis labels for the two graphs over pc273 (ever had dementia) to not be cut off

I've included all of the code I wrote to produce each graph for your reference, hopefully we can figure this out.

@kbuzard
Copy link
Contributor

kbuzard commented Dec 13, 2023

@eldreddyl Can you help @abigailmondin with this? I have to concentrate on giving feedback on everyone's analysis sections and writing two exams so am unlikely to have time until the weekend.

@eldreddyl
Copy link

  • Adding a label for the x axis on all four graphs, when I try using xtitle it gives me an error message

@abigailmondin Could you attach a screenshot of the error message?

@abigailmondin
Copy link
Contributor Author

@eldreddyl Here is the screenshot of the error message I get when trying to use xtitle.
IMG_1377

@eldreddyl
Copy link

@abigailmondin

So I played around with the code and read through the 'graph bar' documentation. To me, it doesn't seem like 'xtitle' is supported for bar graphs.

Screenshot 2023-12-13 122025

You can try two other options. I think either would work, so it may be up to your preference

  1. Use Stata's Graph Editor to add the label yourself. The downside here is you'd have to do this for each graph you make manually and I don't think there is an easy way to make this reproducible

  2. Add a descriptive title to the graph using title("Years of Education by Respondent Type")
    This method is reproducible since it would be in your do file. It would also implicitly describe your x-axis. I believe this is what the documentation meant by 'irrelevant for bar charts.'

Give that a try and let me know if Stata is still giving you trouble

While you do that, I'll look into the pc273 cutoff issue

@eldreddyl
Copy link

@abigailmondin So I ran your code to generate the pc273 graphs. At least when I ran it, the labels weren't cutoff. I don't think its a Stata issue. I would look into your method of saving the images.

You could

  1. Screenshot the graph
  2. Save using the Stata Graph editor
  3. Use graph export in your code

@abigailmondin
Copy link
Contributor Author

@eldreddyl When I re-run the code for the pc273 graphs the labels are still cutoff. Is it possible there is a different issue?

@eldreddyl
Copy link

@eldreddyl When I re-run the code for the pc273 graphs the labels are still cutoff. Is it possible there is a different issue?

@abigailmondin what method are you using to save them?

@abigailmondin
Copy link
Contributor Author

abigailmondin commented Dec 15, 2023

what method are you using to save them?

@eldreddyl I'm clicking the save icon, making it a png, and saving them to a folder. But even when I just run the code without saving them the labels are cutoff.

@xorabear
Copy link
Contributor

xorabear commented Dec 16, 2023

Variable |        Obs        Mean    Std. Dev.       Min    
Max

-------------+--------------------------------------------------


   pz216 |     16,259     12.7292    3.241885          0    
 17

The observation is the amount of people that were looked at within the data. The mean means that on average people have around 12.7 years of education. Standard deviation is how close the results are as compared to the mean which I believe means that some participants have 3 years less than or more than 12.7 years of education. The varoable max means the

ever had
dementia Freq. Percent Cum.

1 500 2.43 2.43
3 1 0.00 2.43
4 51 0.25 2.68
5 20,030 97.23 99.91
8 17 0.08 99.99
9 2 0.01 100.00

Total 20,601 100.00
1.YES 3.DISPUTES PREVIOUS WAVE RECORD, BUT NOW HAS CONDITION 4. DISPUTES PREVIOUS WAVE RECORD, DOES NOT HAVE CONDITION 5. NO 8. DK (Don't Know); NA (Not Ascertained) 9. RF (Refused)

This numbers correlate to the responses of people within the data. Tabulation is a compilation of results within the data which highlights the relationship between education and dementia as most people answered no. Frequency and percentage speak to how many people have dementia within the data set.

. pwcorr pz216 pc273, sig

	pz216	pc273
		
pz216	1.0000 


pc273	0.0496	1.0000 
	0.0000

Pwcorr can be used as a connection between the two variables. The p-value is 0.000. Since this is less than 0.05, the correlation between these two variables is statistically significant. The correlation coefficient measurement ranges from -1 to 1, -1 states there is a perfect negative relationship, 0 symbolizes there is no relationship, and 1 demonstrates a perfect positive relationship. Summarize and tab1 provide background for pwcorr and help with the interpretation.

@kbuzard
Copy link
Contributor

kbuzard commented Dec 16, 2023

@xorabear Remember that you need to ask for the significance level of the correlation, so you need to add ", sig" to the end of the pwcorr command.

@xorabear
Copy link
Contributor

xorabear commented Dec 16, 2023

@xorabear Remember that you need to ask for the significance level of the correlation, so you need to add ", sig" to the end of the pwcorr command.

should i keep summarize and tab1 I thought that might help understand pwcorr, sig @kbuzard

@kbuzard
Copy link
Contributor

kbuzard commented Dec 16, 2023

should i keep summarize and tab1 I thought that might help understand pwcorr, sig

@xorabear tabulating the education variable is the same as a graph you already have, and it is harder to read than looking at the graph, so I suggest you only keep the graph. The results of summarize (and maybe also including a median) would be good to include in your data section.

summarize on the dementia variable gives you statistics that are not really meaningful; the average of the codes that represent different answers doesn't mean anything. the key information in tabulate is useful for this variable, that is, we see that someone identifying as having dementia is quite rare.

@abigailmondin
Copy link
Contributor Author

t least when I ran it, the labels weren't cutoff.

@eldreddyl I'm still unable to see the labels when I run the code. If you are able to see the labels, is there any way you could save them or screenshot them and attach them here?

@eldreddyl
Copy link

@abigailmondin For the purposes of your project, it would be better for your group if you submitted the cutoff-label graphs vs having me attach them. That way you won't lose as many points on the reproducibility section of the rubric.

Here are some other options in the meantime:

  1. Experiment with using the remote desktop vs. the physical computers in Eggers 040. As long as the room isn't booked, you should still have access this week
  2. See if anything changes when one of your group members saves the graphs
  3. Play around with the graph export command. I'm not sure if there is some limitation on the graph size that is causing it to be cutoff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants