Skip to content
This repository has been archived by the owner on Feb 23, 2023. It is now read-only.

Accessibile as a CLI binary. #5

Open
ChrisCates opened this issue Oct 15, 2018 · 29 comments
Open

Accessibile as a CLI binary. #5

ChrisCates opened this issue Oct 15, 2018 · 29 comments

Comments

@ChrisCates
Copy link
Owner

ChrisCates commented Oct 15, 2018

Summary

Make this program both accessible via Golang and Terminal. Ensure that it works correctly in the terminal.

Requirements

  • Must have a download archive feature so that you can get latest entries from 2019 and beyond.

  • Must download files autonomously from a certain date range.

  • Must be able to extract compressed .wet files.

  • Please review the README.md for the proposed functionality.

Payment

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 0.5 ETH (67.12 USD @ $134.25/ETH) attached to it as part of the AccessibleSoftware fund.

@ChrisCates
Copy link
Owner Author

ChrisCates commented Mar 26, 2019

@zyfrank, great, let me know if you have any questions. 💯
I'll be posting more work later!~

@zyfrank
Copy link

zyfrank commented Mar 26, 2019

@ChrisCates, I make a first investigation, I think what I can do are:

  1. use cobra to enhance config

  2. I think we can have two commands: first is download (which include download and unzip files), second is analyze. So you can download in one time and make analyze on another time .

What's is your opinion?

@ChrisCates
Copy link
Owner Author

ChrisCates commented Mar 26, 2019

@zyfrank, for the CLI tool it only needs to be able to download any common crawl file (and also navigate files by historical date) plus unzip.

The analyze tool is just used as a demo and will be moved to goveralls which I will assign to another task and bounty in issue #1

@ChrisCates
Copy link
Owner Author

@zyfrank, you can use this as a reference: https://commoncrawl.s3.amazonaws.com/ for navigating files in common crawl.

If this goes well. I will be adding another bounty for 1 ETH on the goveralls issue (#1) next week if you'd like to take it.

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work for 0.5 ETH (68.81 USD @ $137.61/ETH) has been submitted by:

  1. @zyfrank

@ChrisCates please take a look at the submitted work:


@zyfrank
Copy link

zyfrank commented Mar 27, 2019

seems travis has authentication error

@rauchp
Copy link

rauchp commented Apr 3, 2019

Is this issue already closed or should someone still work on it?

@ChrisCates
Copy link
Owner Author

Hi @pedrojor2,

If @zyfrank is up to retrying. I think he should still get a chance.
If not, happy for you to try.

I actually do want to refactor this repository. Simply so that the formatting is better and easier to use. I'm not sure if @zyfrank completely understood what my intention was in order to build it into a CLI executable.

I am looking to allocate a couple of hours this Friday.

@gitcoinbot
Copy link

gitcoinbot commented Apr 4, 2019

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 7 months, 4 weeks ago.
Please review their action plans below:

1) josprachi has started work.

I am learning Golang. I want to work on this issue
2) jay-dee7 has started work.

i've been working with go for 2 years now and also expert in docker containers and tooling.

Learn more on the Gitcoin Issue Details page.

@josprachi
Copy link

Hi I need help When I tried to run it, I am getting an error
go run: cannot run *_test.go files (src/analyze_test.go)
Please guide

@ChrisCates
Copy link
Owner Author

Hi @josprachi.
Could you tell me what OS and version of Go you're using?
I will whip up a Go container as per: #10

@josprachi
Copy link

Hello @ChrisCates I am using following elementary OS
Linux 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019

@ChrisCates
Copy link
Owner Author

ChrisCates commented Apr 11, 2019 via email

@josprachi
Copy link

Hi I am able to run docker now

@vreddhi
Copy link

vreddhi commented May 3, 2019

@ChrisCates Do you want me to take this?

@ChrisCates
Copy link
Owner Author

ChrisCates commented May 4, 2019

I submitted a bounty for #10.
Once that is complete, we can discuss next steps.

@ChrisCates ChrisCates changed the title Convert to CLI tool Accessibility as a CLI binary. May 5, 2019
@ChrisCates ChrisCates changed the title Accessibility as a CLI binary. Accessibile as a CLI binary. May 5, 2019
@iamonuwa
Copy link
Contributor

iamonuwa commented May 6, 2019

@ChrisCates let's discuss this

@ChrisCates
Copy link
Owner Author

ChrisCates commented May 6, 2019

@iamonuwa, great, yes! So in https://github.com/ChrisCates/CommonCrawler/blob/master/README.md I've specified a configuration that I'd like for us to use.

If you have any questions about the proposed command line interface, let me know.
I'll be back on Friday to discuss more. As today and this week I need to focus on other stuff.

@iamonuwa
Copy link
Contributor

iamonuwa commented May 6, 2019

What does each of these commands do?

commoncrawler --base-uri https://commoncrawl.s3.amazonaws.com/
commoncrawler --wet-paths wet.paths
commoncrawler --data-folder output/crawl-data
commoncrawler --start 0
commoncrawler --stop 5

@iamonuwa
Copy link
Contributor

iamonuwa commented May 6, 2019

Do you wish to build a full cli tool from this project?

@ChrisCates
Copy link
Owner Author

@iamonuwa, those are configurations for using it as a binary.

An example of usage:

commoncrawler start --base-uri https://commoncrawler.com

And that would use a different base path for where CommonCrawl files are stored. This should update the Config struct as well too.

The intended functionality should work both as a library and as a CLI tool when compiled. I will be preparing an issue (with bounty) for making it fully usable as a library. So for now, just focus on it being a CLI tool.

@ChrisCates
Copy link
Owner Author

@iamonuwa I've just added: #13

@iamonuwa
Copy link
Contributor

iamonuwa commented May 6, 2019

The intended functionality should work both as a library and as a CLI tool when compiled. I will be preparing an issue (with bounty) for making it fully usable as a library. So for now, just focus on it being a CLI tool.

It will affect the project structure abit. But will try to capture the expected result

@ChrisCates
Copy link
Owner Author

@iamonuwa, absolutely, that is expected. Just ensure that functionality is relatively the same and it works as intended.

@zoek1
Copy link

zoek1 commented Jul 31, 2019

@ChrisCates this Bounty Is still active?

@ChrisCates
Copy link
Owner Author

We will be revisiting all bounties on this repository at a later date.
Sorry that it's been inactive for a considerable amount of time.

@jay-dee7
Copy link

@ChrisCates is it still active? i would love to work on it

@SeanDunford
Copy link

Any updates?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants