## Elijah Bjork Python Demonstration

Using the website Kaggle, I found a dataset containing the salaries, job_title, and location of positions in cyber security accross the world. I am interested in pursuing a career in cyber security. Below, I have written code to answer questions I have about the infosec industry.

In [3]:
import pandas as pd
filepath = './salaries_cyber.csv' #if you saved the file in a different directory, please replace the path with the path to the file location
df = pd.read_csv(filepath)

In [5]:
#About data
df.info()

### BQ1: What are the top and bottom 10 salaries and job position paid in US dollars?

I am, of course, primarily interested in salary. This section gives me a general idea of what salaries are like.

In [62]:
bq1_df = df[df['employee_residence'] == 'US']

bq1_df = bq1_df[['salary', 'job_title', 'experience_level']]

print(bq1_df.nlargest(10, "salary"))
print(bq1_df.nsmallest(10, "salary"))

### BQ2: What is the median Salary for an entry level professional in the US?

Within the next five years, I hope to start an job in infosec. This will be an entry level position. I want to know what I can expect as a salary for these initial jobs.

In [68]:
bq2_df = df[df['experience_level'] == 'EN']

print(bq2_df[['salary']].median())

### BQ3: What are the job titles available?

In the long run, I want to be in cyber forensics or in incident response. It is good for me to be aware of all the positions available, especially those performing the roles I want.

In [57]:
bq3_df = df[['job_title', 'salary']]
bq3_df = bq3_df.groupby(["job_title"]).median()

print(bq3_df.to_string())

### BQ4: How many employees work for a company in a country not their own?
This is purely for my own curiosity. I wanted to know how many professionals work across borders. I also included what currency they are paid in. I think it is interesting how that is negotiated. 

In [36]:
bq4_df = df[df['employee_residence'] != df['company_location']]

print('Count: {}'.format(bq4_df['employee_residence'].count()))
print(bq4_df[['company_location', 'employee_residence', 'salary_currency']].to_string())

### BQ5: What is the distribution of salaries?
These charts provide a better understanding of compensation for infosec professionals. I printed out the percentiles, then created two box plots. The first shows all of the data on the chart. The many high outlier points made seeing the distribution difficult, so I added a second chart with a limited range.

In [74]:
bq5_df = df[["salary_in_usd"]]

for percentile in [0, .25, .50, .75, 1]:
    print('{}th percentile: {}'.format(str(int(percentile * 100)).rjust(3), bq5_df.quantile(percentile)[0]))
    

bq5_df.plot.box(column="salary_in_usd", title="Full chart")
bq5_df.plot.box(column="salary_in_usd", title="Y: 0 through 275000", ylim=(-10000, 275000))