## **OLDA data exploration using DBeaver**

Rayid Ghani, Frauke Kreuter, Julia Lane, Brian Kim, Adrianne Bradford, Alex Engler, Nicolas Guetta Jeanrenaud, Graham Henke, Daniela Hochfellner, Clayton Hunter, Avishek Kumar, Jonathan Morgan, Ursula Kaczmarek, Benjamin Feder, Ekaterina Levitskaya, Lina Osorio-Copete, Tian Lou.

### 1. Open **DBeaver**
<br>
<img src="images/DBeaver_open_1.png" />
<br><br>

#### 2. Expand the navigation window to explore the <kbd>**appliedda**</kbd> database and double click on <kbd>**schemas**</kbd>
<br>
<img src="images/Dbeaver_schemas_2.png" />
<br><br>

#### 3. Scroll down within the ``Database Navigator`` and double click on the schema **``data_ohio_olda_2018``** to open the **ER Diagram**
<br>
<img src="images/Dbeaver_OLDA_3.png" />
<br><br>

#### 4. Click on `properties` to get useful information such as the list of tables names that are included in **``data_ohio_olda_2018``** as well as the number of rows by each table
<br>
<img src="images/Dbeaver_OLDA_properties_4_1.png" />
<br><br>

#### 5. Double click in one of the tables (e.g., **``oh_otc``**) to access its data, properties and the ER Diagram
<br>
<img src="images/Dbeaver_table_ otc_data_5_1.png" />
<br><br>

#### 6. Open another table by clicking on the ``tables`` down arrow 
<br>
<img src="images/Dbeaver_tables_6.png" />
<br><br>

#### 7. Create a script. Click in ``New SQL Editor`` in the toolbar:
<img src="images/Dbeaver_newSQLeditor_7.png" />
<br><br>

#### 8. SQL select statement to display the list of tables included in the **``data_ohio_olda_2018``** schema
<br>
<img src="images/Dbeaver_ListTables_OLDA_8.png" />
<br>
Press <strong><kbd>Ctrl+Enter</kbd></strong> to run the query. 
<br><br>

#### 9. Save and rename the script. Select the script in the `project-General` window ...
 <br>
<img src="images/Dbeaver_ListScripts_9.png" />
<br>

<p>    ... and press <strong><kbd>F2</kbd></strong> and change the script default name</p>
<br>
<img src="images/Dbeaver_listtables_10.png" />
<br><br>

#### 10. SQL select statement to display the list of variables on the ``oh_otc`` table
<br>
<img src="images/Dbeaver_oh_otc_columns_11.png" />
<br><br>

#### 11. SQL select statement to display ``oh_otc`` data
<br>
<img src="images/select_oh_otc.png" />
<br><br>

#### **Other SQL select statements to subset the ``oh_otc`` data**

-- **How many students completed at least one course from an Ohio Technical Center (OTC) during 2015**

    select distinct
           course_end_date_y as year,
           case 
           when course_end_date_m in (1,2,3) then 1
           when course_end_date_m in (4,5,6) then 2
           when course_end_date_m in (7,8,9) then 3
           when course_end_date_m in (10,11,12) then 4
           end as quarter, 
           count(ssn_hash) as num_completers
    from data_ohio_olda_2018.oh_otc
    where student_result = 1 and -- Completer
      course_end_date_y = '2015'
    group by course_end_date_y, quarter
    order by quarter;
   
-- **Number of students by completion year and subject**

    WITH course_subject as(
                        SELECT ssn_hash, subject_desc, course_end_date_y 
                        FROM data_ohio_olda_2018.oh_otc as a 
                        JOIN data_ohio_olda_2018.oh_subject_codes_lkp as b 
                        ON a.hei_subject_code = b.subject_code
                        where course_end_date_y  is not null and
                        student_result = 1)
    SELECT CASE 
           WHEN course_end_date_y = '20' THEN '2020'
           WHEN course_end_date_y = '17' THEN '2017'
           ELSE course_end_date_y
           END as year, subject_desc, count(ssn_hash) as num_students
    FROM course_subject
    GROUP BY year, subject_desc
    ORDER BY year, num_students desc;

-- **Number of students by completion year and region**

    select distinct case 
           when course_end_date_y = '20' then '2020'
           when course_end_date_y = '17' then '2017'
           else course_end_date_y
           end as year, 
           case 
           when region = 1 then 'CentralOhio'
           when region = 2 then 'NorthwestOhio' 
           when region = 4 then 'WestOhio'
           when region = 5 then 'SouthwestOhio'
           when region = 6 then 'NorthOhio'
           when region = 7 then 'SouthOhio'
           when region = 8 then 'NortheastOhio'
           when region = 9 then 'EastOhio'
           when region = 11 then 'SoutheastOhio'
           end as region, count(ssn_hash) as num_students
    from ada_20_osu.oh_otc
    where region is not null and 
          course_end_date_y is not null and
          student_result = 1
    group by region, year
    order by year, region;

#### **Other data queries**

-- **Number of enrolled students in Ohio community colleges by term**

    SELECT enroll_yr_num as year, 
           enroll_term as term, 
           COUNT(ssn_hash) as num_students
    FROM data_ohio_olda_2018.oh_hei_long as a
    JOIN data_ohio_olda_2018.oh_hei_campus_county_lkp as b
    ON a.enroll_campus = b.campus_num
    WHERE enroll_campus is not null and 
          campus_type_code in ('TC', 'SC', 'CC') -- community colleges
    GROUP BY enroll_yr_num, enroll_term
    ORDER BY enroll_yr_num, enroll_term;
    
-- **How many Ohio students graduated from a community college in 2015?**

    SELECT a.degcert_yr_earned as year, 
           a.degcert_term_earned as term, 
           COUNT(distinct(ssn_hash)) as num_students
    FROM data_ohio_olda_2018.oh_hei_long as a
    JOIN data_ohio_olda_2018.oh_hei_campus_county_lkp as b
    ON a.degcert_campus = b.campus_num
    WHERE degcert_campus is not null and
          campus_type_code in ('TC', 'SC', 'CC') and -- community colleges
          degcert_yr_earned = '2015'
    GROUP BY degcert_yr_earned, degcert_term_earned
    ORDER BY degcert_yr_earned, degcert_term_earned;

-- **How many people that graduated from a community college in 2015 in OH were working in OH in 2016?**

    with emp_oh as (select distinct(ssn_hash) as ssn
                    from data_ohio_olda_2018.oh_ui_wage_by_quarter z
                    where year = '2016'),
         oh_cc_grad_2015 as (select distinct a.degcert_yr_earned as year, 
                                    a.degcert_term_earned as term, 
                                    a.ssn_hash as ssn
                             from data_ohio_olda_2018.oh_hei_long a
                             join data_ohio_olda_2018.oh_hei_campus_county_lkp b
                             on a.degcert_campus = b.campus_num
                             where degcert_campus is not null and
                                  campus_type_code in ('TC', 'SC', 'CC') and
                                  degcert_yr_earned = '2015')
    select count(distinct(ssn))
    from oh_cc_grad_2015
    where ssn in (select ssn from emp_oh);
  
-- **How many students that graduated from a community college in 2015 in Ohio were working in Illinois in 2016?**

    with emp_il as (
                    select distinct(ssn)
                    from (select * from il_des_kcmo.il_wage_2016q1
                    union
                    select * from il_des_kcmo.il_wage_2016q2
                    union
                    select * from il_des_kcmo.il_wage_2016q3
                    union
                    select * from il_des_kcmo.il_wage_2016q4) as a),
         oh_cc_grad_2015 as (select distinct a.degcert_yr_earned as year, 
                                    a.degcert_term_earned as term, 
                                    a.ssn_hash as ssn
                             from data_ohio_olda_2018.oh_hei_long a
                             join data_ohio_olda_2018.oh_hei_campus_county_lkp b
                             on a.degcert_campus = b.campus_num
                             where degcert_campus is not null and
                                  campus_type_code in ('TC', 'SC', 'CC') and 
                                  degcert_yr_earned = '2015')
    select count(distinct(ssn)) as num_emp_il_2016
    from oh_cc_grad_2015
    where ssn in (select ssn from emp_il);

Here's a link to a DBeaver [tutorial](https://github.com/dbeaver/dbeaver/wiki/Application-Window-Overview) if you would like additional resources.