### ELT Project - Analysis Section

#### To Explore:
- Avg name length
- Most common names
- Closest people in db to each other
- Names of people by dob
- Easiest passwords to crack (least complex)
- Names with large discrepancies between actual and predicted ages

In [1]:
import duckdb
import pandas as pd
import ipykernel

In [2]:
# connect to duckdb
con = duckdb.connect("../transform/restore/random_people.duckdb")

# print all columns of pandas df
pd.set_option('display.max_columns', None)

## Top Passwords
##### I scored the password strength of everyone in my duckdb database. The scoring system used is rather basic:
##### 1 point for uppercase character, 1 point for lowercase character, 1 point for number, 1 point for a password >= 8 characters

#### While not great passwords in a traditional sense, the top passwords were:
## porsche9, 1x2zkg8w, cricket1, 1a2b3c4d, thunder1

In [44]:
top_passwords_query = """ 
select 
    full_name, 
    password 
from all_people 
where password_complexity_score = (select max(password_complexity_score) from all_people)
order by password_complexity_score desc
"""

top_passwords_query = con.execute(top_passwords_query).df()
top_passwords_query

Unnamed: 0,full_name,password
0,ثنا موسوی,porsche9
1,Anisa Denis,1x2zkg8w
2,Santana Pinto,cricket1
3,Iiris Ramo,1a2b3c4d
4,هلیا قاسمی,thunder1


## Longest First Names:
#### Mexico 🇲🇽 (7.3), Ukraine 🇺🇦 (7.1), Spain 🇪🇸 (6.9)

## Shortest First Names:
#### USA 🇺🇸 (5.2), Turkey 🇹🇷 (5.1), Iran 🇮🇷 (4.9)

In [43]:
name_length_query = """
with name_length as (
    select
        address_country, 
        avg(len(first_name)) as average_name_length,
        row_number() over (order by average_name_length desc) as longest_names, 
        row_number() over (order by average_name_length) as shortest_names 
    from people_mart 
    group by 1 
    order by 2 desc
)

select
    address_country, 
    average_name_length 
from name_length 
where longest_names <= 3 or shortest_names <= 3
"""

name_length_query = con.execute(name_length_query).df()
name_length_query

Unnamed: 0,address_country,average_name_length
0,Mexico,7.285714
1,Ukraine,7.125
2,Spain,6.866667
3,United States,5.2
4,Turkey,5.071429
5,Iran,4.846154


## Age Over Expected
#### **"Martha's"** in my database are younger than you'd expect (17 years under expected)
#### **"Marcus's"** are older (15 years over expected)

In [42]:
age_deltas_query = """
select 
    first_name,
    age, 
    predicted_age, 
    age_delta 
from all_people 
where predicted_age is not null 
order by age_delta
"""
 
age_deltas_query = con.execute(age_deltas_query).df()
age_deltas_query

Unnamed: 0,first_name,age,predicted_age,age_delta
0,Martha,59,76,-17
1,Eren,34,45,-11
2,Theo,58,67,-9
3,Kimberly,46,53,-7
4,Marcus,68,53,15


## Current Predicted Activity
#### Uses the timezone offset fields from the random people API to add/subtract time to the current timestamp at run time to reasonably estimate what each person in my database is doing at the moment. Results below are from my last run, but will vary dependent on run time. I did not add logic for weekends, so these people are working 56 hour weeks (8 hours x 7 days).

#### At the moment:
#### 68 doing leisure activites, 68 working, 66 sleeping, 20 commuting to work, 19 eating breakfast, 10 are eating lunch 

In [40]:
person_activities_query = """
with person_activities as (
    select
    uuid,
    timezone_offset,
    current_localtime() + cast(hour_tz_adjustment as int) * interval 1 HOUR + cast(minute_tz_adjustment as int) * interval 1 MINUTE as person_local_timestamp,
    substring(cast(person_local_timestamp as varchar), 1, 5) as local_time,
    case 
        when local_time <= '06:00' then 'Sleeping'
        when local_time <= '08:00' then 'Breakfast'
        when local_time <= '09:00' then 'Work Commute'
        when local_time <= '12:00' then 'Working'
        when local_time <= '13:00' then 'Lunch'
        when local_time <= '17:00' then 'Working'
        when local_time <= '18:00' then 'Work Commute'
        when local_time <= '23:00' then 'Leisure'
        when local_time <= '24:00' then 'Sleeping'
        else null
        end as predicted_current_activity
    from all_people
)

select predicted_current_activity, count(*) as num_doing_activitiy
from person_activities
group by 1
order by 2 desc
"""

person_activities_query = con.execute(person_activities_query).df()
person_activities_query

Unnamed: 0,predicted_current_activity,num_doing_activitiy
0,Working,68
1,Leisure,68
2,Sleeping,66
3,Work Commute,20
4,Breakfast,19
5,Lunch,10


In [45]:
q = """
select * from all_people ;
"""

q = con.execute(q).df()
q

Unnamed: 0,gender,title,first_name,last_name,address_street_number,address_street_name,address_city,address_state,address_country,address_postcode,address_latitude,address_longitude,timezone_offset,timezone_description,email,uuid,username,password,password_salt,md5,sha1,sha256,dob,age,registered_date,registered_age,phone,cell,ssn_type,ssn,picture_large,picture_medium,picture_thumbnail,nationality,processed_ts,predicted_age,full_name,password_length,password_has_upper,password_has_lower,password_has_numeric,password_has_length,password_complexity_score,good_password,age_delta,hour_tz_adjustment,minute_tz_adjustment
0,male,Mr,Albert,Gibson,7091,Preston Rd,Hayward,Tennessee,United States,81715,84.8829,-75.3620,-3:30,Newfoundland,albert.gibson@example.com,044ba6dc-57b7-407a-86b3-63bbd2e31a9f,saddog449,google,wCSBdEPr,8702de35ee868ad57f6984c6b9893104,e7659f02d19134a82099555bb00264a2dd5b53f9,eb0a8a221eef2dfda18b6b54fe57c2f5229e7437a758ee...,1956-07-21T12:42:38.468Z,68,2016-09-22T16:18:46.589Z,8,(819) 306-5984,(341) 522-7972,SSN,777-08-8183,https://randomuser.me/api/portraits/men/68.jpg,https://randomuser.me/api/portraits/med/men/68...,https://randomuser.me/api/portraits/thumb/men/...,US,2025-06-25T02:19:20.233471,,Albert Gibson,6,0,1,0,0,1,0,,-3,30
1,female,Mrs,Agafiya,Krizhanivskiy,6845,Tetyani Yablonskoyi,Radehiv,Kirovogradska,Ukraine,70135,78.9596,11.9206,-3:30,Newfoundland,agafiya.krizhanivskiy@example.com,07a4afb7-a92d-43ea-abb8-5d8330daec57,lazyfrog853,reflex,Mh1n6NPR,8221b21efce3b8cd7c0d4eb0b8671707,f91d273e3ca0dce174bbc0fb73d587e976fcfbca,5b015d45b1b474be9a1e2d5e0060aa3076ebccb1161db3...,1976-06-26T02:49:46.110Z,48,2006-09-06T20:43:50.540Z,18,(066) V68-8340,(067) L67-9760,,,https://randomuser.me/api/portraits/women/19.jpg,https://randomuser.me/api/portraits/med/women/...,https://randomuser.me/api/portraits/thumb/wome...,UA,2025-06-25T02:20:03.412266,,Agafiya Krizhanivskiy,6,0,1,0,0,1,0,,-3,30
2,female,Ms,ثنا,موسوی,4700,فداییان اسلام,دزفول,خراسان رضوی,Iran,73260,13.0241,70.1014,-4:00,"Atlantic Time (Canada), Caracas, La Paz",thn.mwswy@example.com,5ab47864-b91f-4826-aeb8-eb0d5c7d24e4,lazyswan758,porsche9,Je7Hjy36,9f88dc64bfe22d5095285a51c7991788,863c954c7b744519df8d8645ee40e76cbec0cb05,685c5c85d8ec0ae0e75f6d3d0ab6f26c893115eaf3df56...,1947-07-04T15:16:46.095Z,77,2019-04-19T13:44:53.116Z,6,074-81282613,0954-716-1755,,,https://randomuser.me/api/portraits/women/34.jpg,https://randomuser.me/api/portraits/med/women/...,https://randomuser.me/api/portraits/thumb/wome...,IR,2025-06-25T02:20:05.240165,,ثنا موسوی,8,0,1,1,1,3,1,,-4,00
3,male,Mr,پرهام,محمدخان,8169,میدان استقلال,مشهد,قزوین,Iran,80407,20.6568,-69.0577,+3:30,Tehran,prhm.mhmdkhn@example.com,f99dab8c-b47a-4cdb-9f86-70b796f8ef99,orangegoose502,yang,RpDQUZRt,3560b46b4041db0a9277b4426ae154e9,ff6061bf346a4cea3d9688d0088fe26b32ce963a,c834d474fb8908147e64090974da653022b2b14ea8af1d...,1946-06-06T16:19:52.963Z,79,2016-03-15T06:36:39.353Z,9,054-95895598,0993-526-6392,,,https://randomuser.me/api/portraits/men/46.jpg,https://randomuser.me/api/portraits/med/men/46...,https://randomuser.me/api/portraits/thumb/men/...,IR,2025-06-25T03:16:35.177492,,پرهام محمدخان,4,0,1,0,0,1,0,,+3,30
4,female,Miss,Saakje,Keesmaat,7958,Jhr C Roelllaan,Laaghalerveen,Overijssel,Netherlands,4428 UW,33.7162,154.4652,-8:00,Pacific Time (US & Canada),saakje.keesmaat@example.com,c522cd7b-55ef-4898-a3bd-748828ea7ed7,ticklishkoala334,revoluti,x50FMHOo,e7a6bc88488818daa19d17a66b019a01,4e0daa54df0f6d9807c46f36bfb784b60dc7211f,06dc905c5491dfcd2a78e4fcc60a1b7471b32afba0fe50...,1974-09-18T20:35:10.177Z,50,2013-04-03T08:01:20.178Z,12,(0371) 294387,(06) 84217365,BSN,64660565,https://randomuser.me/api/portraits/women/12.jpg,https://randomuser.me/api/portraits/med/women/...,https://randomuser.me/api/portraits/thumb/wome...,NL,2025-06-25T23:17:08.217881,,Saakje Keesmaat,8,0,1,0,1,2,0,,-8,00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
246,male,Mr,Marcus,Meyer,6232,Lone Wolf Trail,Orange,Western Australia,Australia,4405,-84.5414,-119.5374,+3:30,Tehran,marcus.meyer@example.com,a9997aff-4b61-43b2-bdae-b32d912bb6e8,redwolf748,spider,rQygoTj9,ae52b868f1226e525e40086b4f91f0a5,d2cecbbedaeb494144d08e64b7acc9999f234447,e88288b2fbf3b8475a0d8bd7e74310b1cb979953ac5f24...,1956-07-07T22:40:19.624Z,68,2005-05-26T01:48:59.236Z,20,05-1177-3951,0430-658-251,TFN,168137402,https://randomuser.me/api/portraits/men/82.jpg,https://randomuser.me/api/portraits/med/men/82...,https://randomuser.me/api/portraits/thumb/men/...,AU,2025-06-26T00:33:18.946299,53,Marcus Meyer,6,0,1,0,0,1,0,15,+3,30
247,male,Mr,بردیا,احمدی,736,امام خمینی,ورامین,آذربایجان شرقی,Iran,19722,85.3785,43.1301,+3:00,"Baghdad, Riyadh, Moscow, St. Petersburg",brdy.hmdy@example.com,b2283ca9-2f57-412b-b5de-490167dfe61d,yellowleopard262,pamela,FpukVGPh,4e264da6e4eb0c19d1de148de50fd5d6,8535e7d4ba711e59083da6432e3ba1de571959bb,0d2ca68edfbfa4d278b95056c1c2292ca8dce37a8d3ea5...,1992-05-29T07:42:22.320Z,33,2016-02-18T18:31:08.035Z,9,083-20401529,0902-585-3700,,,https://randomuser.me/api/portraits/men/60.jpg,https://randomuser.me/api/portraits/med/men/60...,https://randomuser.me/api/portraits/thumb/men/...,IR,2025-06-26T00:34:14.224241,,بردیا احمدی,6,0,1,0,0,1,0,,+3,00
248,male,Mr,Theo,Gauthier,4045,York St,Notre Dame de Lourdes,Alberta,Canada,K5L 3M6,-60.6864,106.1447,+8:00,"Beijing, Perth, Singapore, Hong Kong",theo.gauthier@example.com,ad55c577-4297-4d82-898c-5ca76098343b,heavyduck177,beardog,wLqTiQEJ,422fb1b89017163998f801747fa4ef48,24a27fc17de55530623fd71120904b186a12ebd4,579ba51c88d541371fdb896d799d74c56ae7a65d8175f9...,1967-02-10T04:55:40.574Z,58,2005-03-04T02:03:53.920Z,20,P80 F21-7830,I07 Q53-6237,SIN,659026116,https://randomuser.me/api/portraits/men/9.jpg,https://randomuser.me/api/portraits/med/men/9.jpg,https://randomuser.me/api/portraits/thumb/men/...,CA,2025-06-26T00:35:27.612911,67,Theo Gauthier,7,0,1,0,0,1,0,-9,+8,00
249,female,Ms,Kimberly,Mason,2274,South Street,Leixlip,Longford,Ireland,20326,29.9823,177.2776,-4:00,"Atlantic Time (Canada), Caracas, La Paz",kimberly.mason@example.com,9aeac179-7732-4cb4-9ddb-1876e50bf7ca,lazymouse734,sharon,P63UHwZR,88a138efccdedbfa853809994811bffa,ecf1bf01e06f7aab964f6cd6dedcfb0f8a10cb99,e974184b9b44adc89ef0e56aa8cc26046e1ac6e50bd73c...,1979-01-13T18:00:50.947Z,46,2004-03-28T07:09:55.451Z,21,071-861-6431,081-552-3728,PPS,2230487T,https://randomuser.me/api/portraits/women/37.jpg,https://randomuser.me/api/portraits/med/women/...,https://randomuser.me/api/portraits/thumb/wome...,IE,2025-06-26T00:35:27.612930,53,Kimberly Mason,6,0,1,0,0,1,0,-7,-4,00
