The Project

This is the first week group project for the data class from Ironhack, in Berlin, march 2021 The tasks can be found at the end of this readme file. We created a function that pulls data from a Data Job postings database, in order to create a visualization that shows the percentage of job adverts in the database that contains each keyword.

The Team:

[samcana]https://github.com/samcana
[antonio-datahack ]https://github.com/antonio-datahack
[Jennipher K]https://github.com/Jennipher0716
And myself.

Process notes:

As a starting point, we played a bit around with the data, checking keywords in full-data-test-mood-check-pre-analysis.ipynb
We started the project doing a code along all together over week-1-project-code.ipynb
this file creates the graph skills-data-scientist-usa.png, which we used to create a star wars themed deliverable, as this was the theme of the presentations.
You can find the deliverable here: https://github.com/Alex-Skp/Week-1-Project/blob/master/onepager-delivery.pdf

As for the cleaning steps and executing of the code:

Cleaned dataset to make it easier to find certain keywords in the job descriptions
Removed meaningless words from the description, and stored it in lists.
Added up all lists in order to look for keywords we find meaningful
Decided to focus only in data scientist postings, as they were significantly more numerous than other postings
Checked in how many postings the skills we would have or might acquired in the bootcamp are included
Calculated the percentage over the total data scientist postings
Plotted the result. We didnt spend enough time in the visualization but we will make it look better for the presentation.
Check up the notebook: https://github.com/Alex-Skp/Week-1-Project/blob/master/group-project-code.ipynb

A final post-project function:

In the file creating-script we take the cleaned database, called export_usa.csv, and test the different steps to build a function that delivers the required graph
The final function to test is in final-product-test.ipynb, feel free to fork and play around with it.
The thinking process: https://github.com/Alex-Skp/Week-1-Project/blob/master/another-perspective-for-function.ipynb
The final result: https://github.com/Alex-Skp/Week-1-Project/blob/master/final-product-test.ipynb

Task : clean the data - summarise your findings in a 'one pager'

Here's your challenge for your first group project!

the deadline for finishing is Monday at noon; I will give you class time to work on this project, and you should submit your one pager via the student portal AND deliver a short group presentation to your classmates.

You will be working with a data set hosted on Kaggle that has been scraped for you from the web about US data science hires in 2018 (ie pre-covid!). The author wanted to look at some specific questions :

Who gets hired? What kind of talent do employers want when they are hiring a data scientist?

Which location has the most opportunities?

What skills, tools, degrees or majors do employers want the most for data scientists?

I think you can do more with this data set to summarise the insights and the process of data wrangling. The data is not easy to work with at the moment. Your main challenge will be to use Python to clean, wrangle and generally reshape the data to make it more straightforward to analyse- to visualise what you find in the data you can either export it to a csv, use excel to chart it, or you can explore the capabilities of Python to plot the data.

You will be in a group (2-3 students) to work on this project; as we are remote this is an opportunity to get to know each-other while applying your recently acquired skills working with messy data. This is your first group project- be reasonable in your expectations of what can be achieved in the timeframe and working with new people!

The insights you find can be documented simply with screenshots of your data frames or downloaded images of charts, but I would like to see these accompanied by some simple annotation/text summarising both what you found AND how easy it was to get to. What we want from each group is a one pager- suitable for an infographic or blog page, describing what you learnt from the data and what the gaps in the data or limitations of it are.

For inspiration on what sort of insights you might look into, you can see the web scraper's blog here : https://nycdatascience.com/blog/student-works/who-gets-hired-an-outlook-of-the-u-s-data-scientist-job-market-in-2018/

Some ideas for working successfully remotely with a group:

set up a co-working zoom / slack session
have an 'installation party' - getting started with the data all together, bring your own drinks and snacks
some of the group could try working primarily with python/pandas, others can try with Excel - and compare what you find
split the task among you- maybe some of you are better than presentations, others at pandas or plotting
share a digital whiteboard to brainstorm ideas
agree a shared communication method eg Telegram / Slack or co work in a zoom break out room

Heres the data we will be working with:

Kaggle data source

HINT : You will need to first download the data as csv file(s)

Expected steps and outcome:

You can use the ALL data set you see in the Kaggle link or practice combining the separate files into one data frame
employ string functions or REGEX, eg. Like , IF/ELSE to extract common values from strings of different lengths, eg job description
insights by any combination of job profile, company, location city, area of the country
create new columns as needed to enhance the data source: for example employ Boolean T/F logic to indicate which roles are closest to big financial or software centres in the US
make a decision about handling NULLs in the data - fill in values where logical, ignore them or clean them where not
any other data cleaning or wrangling tasks you find useful.
'one pager' summary - including insights, commentary, review of how easy the data was to work with and highlighting any limitations you found in the data set. This can be in pdf, slide, word doc etc... this can be as beautiful or as simple as you like. You will be sharing this with your classmates and the teaching team will provide feedback on your submissions. As you effectively have ONLY one page to make your case, you might start by identifying multiple trends and then scale back to focus on just one or two important ones. The main focus of the exercise is on working with messy data, so if you dont find any great data insights, you should feel free to take screen shots of your cleaning procedures and talk about them. One member of the group should host this one pager on git / googledrive / similar and submit the url.
a short class presentation (aim for 5 minutes) involving all members of your group to talk through your method and findings.

--- any questions reach out to the LT or TAs

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
alldata.csv		alldata.csv
another-perspective-for-function.ipynb		another-perspective-for-function.ipynb
clean_usa.csv		clean_usa.csv
export_usa.csv		export_usa.csv
final-product-test.ipynb		final-product-test.ipynb
group-project-code.ipynb		group-project-code.ipynb
onepager-delivery.pdf		onepager-delivery.pdf
skills-data-scientist-usa.png		skills-data-scientist-usa.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Project

The Team:

Process notes:

As for the cleaning steps and executing of the code:

A final post-project function:

Task : clean the data - summarise your findings in a 'one pager'

Here's your challenge for your first group project!

Who gets hired? What kind of talent do employers want when they are hiring a data scientist?

Which location has the most opportunities?

What skills, tools, degrees or majors do employers want the most for data scientists?

Some ideas for working successfully remotely with a group:

Heres the data we will be working with:

Expected steps and outcome:

--- any questions reach out to the LT or TAs

About

Releases

Packages

Languages

thusspokedata/Week-1-Project

Folders and files

Latest commit

History

Repository files navigation

The Project

The Team:

Process notes:

As for the cleaning steps and executing of the code:

A final post-project function:

Task : clean the data - summarise your findings in a 'one pager'

Here's your challenge for your first group project!

Who gets hired? What kind of talent do employers want when they are hiring a data scientist?

Which location has the most opportunities?

What skills, tools, degrees or majors do employers want the most for data scientists?

Some ideas for working successfully remotely with a group:

Heres the data we will be working with:

Expected steps and outcome:

--- any questions reach out to the LT or TAs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages