This is not the contest problem, please go to the website to see it.

Data Overflow Mock Problem

Data overflow contest mock problem.

Location Aggregation

We have a TSV(Tab Separated value) file containing user_id and location_id in each line, the goal of this task is to aggregate the user visitation into a output TSV file containing user_id and the location_ids in a single line without any duplicates

Note : user_id and location_id are integers, user_id represents a user and location_id represents a location.

Input File(s)

USER_ID LOCATION_ID
1234    1
1234    2
1245    6
1293    7
1234    4
1245    5
1293    4
2345    1
1234    1

Output File

1234    1,2,4
1245    6,5
1293    7,4
2345    1

How will your code be tested?

The code will be tested against test cases.

For performance we are testing the code with a file having 1 million records, 10 million records and 100 million records

Expected Runtime : Your code is expected to generate the output file for 1 million records within 5 seconds, 10 million within 50 seconds and 100 million within 500 seconds.

Important Note : While testing your code we will run your code with N files, 2 <= N <= 10. You can expect all the N files to be evenly divided with data.

Hardware Requirement:

1GB RAM, 2 core CPU

Sample data

Sample data can be downloaded from here

This is a zip file, please uncompress it using a unzip command or simple uncompressing tool.

You can use this sample data to run your code.

How to get started with the repository?

Login to github and visit the repository.
Fork the repository by clicking the fork button.
Clone the forked respository to the local machine.
Start writing your code by updating the location_aggregation function in the code/script.py feel free add/modify the code.
If your code is using additional libraries please mention it in the requirements.txt.
Run the basic test cases by running.
```
python3 wrapper.py test
```
This tests your code with basic test cases.
To run your code with the given sample input file, please run

python3 wrapper.py run -i input_file_1.tsv [input_file_2.tsv ...] -o output_file.tsv

Once you are happy with the code, commit the code
Submit your github repository link along with the commit id in our website.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
code		code
images		images
tests		tests
README.md		README.md
requirements.txt		requirements.txt
wrapper.py		wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This is not the contest problem, please go to the website to see it.

Data Overflow Mock Problem

Location Aggregation

Input File(s)

Output File

How will your code be tested?

Hardware Requirement:

Sample data

How to get started with the repository?

About

Releases

Packages

Languages

affinityanswers/dataoverflow-mockproblem

Folders and files

Latest commit

History

Repository files navigation

This is not the contest problem, please go to the website to see it.

Data Overflow Mock Problem

Location Aggregation

Input File(s)

Output File

How will your code be tested?

Hardware Requirement:

Sample data

How to get started with the repository?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages