Skip to content
This repository has been archived by the owner on Apr 13, 2021. It is now read-only.

NetKAN-bot Rewrite #19

Merged
merged 113 commits into from
Aug 13, 2015
Merged
Show file tree
Hide file tree
Changes from 111 commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
2dee459
Initial dist zilla stuffs #3
techman83 May 28, 2015
f55de01
We really don't want to add swp files
techman83 May 28, 2015
ef42f74
Initial git interactions lib
techman83 May 28, 2015
fe82244
Tests + Data for Git.pm
techman83 May 28, 2015
7170d48
Create git repo on the fly
May 29, 2015
e698e7d
Test repo data
May 29, 2015
559b29c
Extra methods and tests added, commit not yet functional
May 29, 2015
e81a7f4
Git interaction abstraction + tests - closes #10 and relates to #3 + #4
techman83 May 31, 2015
9605208
Small typo
techman83 May 31, 2015
9439546
Http interactions #3 #4
techman83 May 31, 2015
a412825
Minor lib rearrange
techman83 May 31, 2015
c7f190c
Abstracted common test things #4
techman83 May 31, 2015
01ffa43
Initial NetKAN abstraction layout - #3 #4
techman83 May 31, 2015
0832aef
NetKAN interactions + tests - #3 #4
techman83 May 31, 2015
ddd390d
Typos + stopword
techman83 May 31, 2015
2a15467
Config loader + tests - #3 #4
techman83 Jun 2, 2015
f75b515
Main NetKAN indexer file (needs MOAR tests) #3
techman83 Jun 2, 2015
ce4a76a
Reset method + tests - #3 #4
Jun 3, 2015
c31f026
Fixes for cache path #3 #4
Jun 3, 2015
df43df6
Main NetKAN indexer created #3 #4
Jun 3, 2015
1e570ce
Fix repo metadata
Jun 5, 2015
68d7788
Quick hack to make GH token optional
Jun 5, 2015
e620eb3
Fix cache, add dep, fix validate logic and git reset - #3
Jun 5, 2015
7b06371
Remove errant comment
Jun 5, 2015
ed753fd
Add test metadata
Jun 5, 2015
04aa7b6
Tests for NetKAN - #4
Jun 5, 2015
121a551
Break Validate out into Role #3
techman83 Jun 6, 2015
f7e15e9
Initial logger classes
techman83 Jun 6, 2015
7e67364
Over zealous dep removal!
techman83 Jun 6, 2015
9d2dd05
Tests for Validation Role #4
techman83 Jun 6, 2015
440090a
Allow for debug logging #5
techman83 Jun 7, 2015
9deddd5
Quick hack because of S3 cert
techman83 Jun 7, 2015
9ca1296
Change NetKAN url - closes #17
techman83 Jun 7, 2015
84d4a1b
Extra logging tests - #3 #5
techman83 Jun 7, 2015
fc82a5e
Replacement indexing script
techman83 Jun 7, 2015
b7a4dfb
TODO added
techman83 Jun 7, 2015
cd20fa7
Logging added #5
techman83 Jun 7, 2015
ab39330
Add POD and replace curly brace
techman83 Jun 7, 2015
91239bb
Add debug log for inflation
techman83 Jun 7, 2015
ba68573
Add file to WARN log #5
Jun 8, 2015
cfd388a
CLI options added
techman83 Jun 8, 2015
71da275
Travis Config
techman83 Jun 8, 2015
5c40eca
Doco + Doco Fixes
techman83 Jun 8, 2015
99b57cb
Handle not parsing an error without barfing
techman83 Jun 9, 2015
c291d0b
This tests fine locally and there are secondary tests to back this up
techman83 Jun 9, 2015
1f42542
Pip install jsonschema and set PYTHONHOME
techman83 Jun 9, 2015
1c83973
This test doesn't work on Travis yet
techman83 Jun 9, 2015
4489571
Set todo if on travis
techman83 Jun 9, 2015
0592ea7
var typo
techman83 Jun 9, 2015
9c31105
Travis' 5.22 build env is a little wonky at the moment. Fix this later
techman83 Jun 9, 2015
8e2e13a
Better articulate what the lite option is for
Jun 26, 2015
515223d
This method isn't implemented, lets bail out if someone tries to use it
Jun 26, 2015
502dd0b
Add license note
Jun 26, 2015
23b031e
Merge pull request #22 from techman83/update_netkan_exe
pjf Jun 28, 2015
d5bb4a4
Initial dist zilla stuffs #3
techman83 May 28, 2015
b1e4d72
We really don't want to add swp files
techman83 May 28, 2015
ce0bc42
Initial git interactions lib
techman83 May 28, 2015
9105bf6
Tests + Data for Git.pm
techman83 May 28, 2015
41eed36
Create git repo on the fly
May 29, 2015
d6dbe7c
Test repo data
May 29, 2015
39f5a2f
Extra methods and tests added, commit not yet functional
May 29, 2015
dd3ff32
Git interaction abstraction + tests - closes #10 and relates to #3 + #4
techman83 May 31, 2015
837c450
Small typo
techman83 May 31, 2015
b583ba2
Http interactions #3 #4
techman83 May 31, 2015
896d071
Minor lib rearrange
techman83 May 31, 2015
3f238fb
Abstracted common test things #4
techman83 May 31, 2015
be8b502
Initial NetKAN abstraction layout - #3 #4
techman83 May 31, 2015
6fbef09
NetKAN interactions + tests - #3 #4
techman83 May 31, 2015
3900ae7
Typos + stopword
techman83 May 31, 2015
0599735
Config loader + tests - #3 #4
techman83 Jun 2, 2015
b8cecbf
Main NetKAN indexer file (needs MOAR tests) #3
techman83 Jun 2, 2015
3ca8bcb
Reset method + tests - #3 #4
Jun 3, 2015
28c4a54
Fixes for cache path #3 #4
Jun 3, 2015
27fd72b
Main NetKAN indexer created #3 #4
Jun 3, 2015
87cfe1a
Fix repo metadata
Jun 5, 2015
f0ef298
Quick hack to make GH token optional
Jun 5, 2015
d46bec0
Fix cache, add dep, fix validate logic and git reset - #3
Jun 5, 2015
6137a68
Remove errant comment
Jun 5, 2015
cfaac15
Add test metadata
Jun 5, 2015
85a1fcb
Tests for NetKAN - #4
Jun 5, 2015
0d28d22
Break Validate out into Role #3
techman83 Jun 6, 2015
d32538e
Initial logger classes
techman83 Jun 6, 2015
f753f28
Over zealous dep removal!
techman83 Jun 6, 2015
4e54312
Tests for Validation Role #4
techman83 Jun 6, 2015
2d83881
Allow for debug logging #5
techman83 Jun 7, 2015
ffefd6a
Quick hack because of S3 cert
techman83 Jun 7, 2015
eaea92c
Change NetKAN url - closes #17
techman83 Jun 7, 2015
5d1cbc3
Extra logging tests - #3 #5
techman83 Jun 7, 2015
1e6f8a1
Replacement indexing script
techman83 Jun 7, 2015
15699c8
TODO added
techman83 Jun 7, 2015
7628eeb
Logging added #5
techman83 Jun 7, 2015
ccf80ee
Add POD and replace curly brace
techman83 Jun 7, 2015
a4b1be2
Add debug log for inflation
techman83 Jun 7, 2015
1787c0b
Add file to WARN log #5
Jun 8, 2015
939875e
CLI options added
techman83 Jun 8, 2015
f50e567
Travis Config
techman83 Jun 8, 2015
6ea64ad
Doco + Doco Fixes
techman83 Jun 8, 2015
9f88ab6
Handle not parsing an error without barfing
techman83 Jun 9, 2015
d96b1f6
This tests fine locally and there are secondary tests to back this up
techman83 Jun 9, 2015
736fee8
Pip install jsonschema and set PYTHONHOME
techman83 Jun 9, 2015
bae1972
This test doesn't work on Travis yet
techman83 Jun 9, 2015
e5564a0
Set todo if on travis
techman83 Jun 9, 2015
7cd7103
var typo
techman83 Jun 9, 2015
8652295
Travis' 5.22 build env is a little wonky at the moment. Fix this later
techman83 Jun 9, 2015
cf99ecf
Better articulate what the lite option is for
Jun 26, 2015
0889a6d
This method isn't implemented, lets bail out if someone tries to use it
Jun 26, 2015
a119f5c
Add license note
Jun 26, 2015
a447101
use File::Temp instead - thanks @pjf!
techman83 Jun 28, 2015
786225e
Merge branch 'build_library' of github.com:techman83/NetKAN-bot into …
techman83 Jun 28, 2015
afe100b
Fail loudly for #25
Jul 1, 2015
de7639c
We use system for clone #25
Jul 1, 2015
8d43b1a
Not benign git warning
Aug 11, 2015
c68b091
We have a config object, lets use it
Aug 11, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ nytprof.out
*.o
*.bs
/_eumm/
App-KSP_CKAN-*
*.swp
27 changes: 27 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# thanks -> http://blogs.perl.org/users/alex_balhatchet/2013/04/travis-ci-perl.html
language: perl
perl:
# - "5.22"
- "5.20"
- "5.18"
- "5.16"
- "5.14"
- "5.12"
- "5.10"
before_install:
# Prevent "Please tell me who you are" errors for certain DZIL configs
- git config --global user.name "TravisCI"
install:
# Deal with all of the DZIL dependancies, quickly and quietly
- pip install --user jsonschema
- export PYTHONPATH=~/.local/lib/python2.7/site-packages/
- cpanm --quiet --notest --skip-satisfied Dist::Zilla
- cpanm --quiet --notest --skip-satisfied Test::Perl::Critic
- dzil authordeps | grep -vP '[^\w:]' | xargs -n 5 -P 10 cpanm --quiet --notest --skip-satisfied
- dzil listdeps | grep -vP '[^\w:]' | xargs -n 5 -P 10 cpanm --quiet --notest --skip-satisfied
- cpanm --quiet --notest Devel::Cover::Report::Coveralls
- cpanm --quiet --notest Dist::Zilla::App::Command::cover
script:
- dzil test
after_success:
- dzil cover -outputdir cover_db -report coveralls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, dzil has a coveralls plugin? That's awesome! :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It totally is :D

22 changes: 0 additions & 22 deletions LICENSE

This file was deleted.

76 changes: 63 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
# NetKAN-bot
NetKAN indexing service
App::KSP-CKAN [![Build Status](https://travis-ci.org/KSP-CKAN/NetKAN-bot.svg?branch=master)](https://travis-ci.org/KSP-CKAN/NetKAN-bot) [![Coverage Status](https://coveralls.io/repos/KSP-CKAN/NetKAN-bot/badge.svg?branch=master)](https://coveralls.io/r/KSP-CKAN/NetKAN-bot?branch=master)

TODO: Expand this!

We'll need some deps.
Non Perl Dependencies
=====================
```bash
apt-get install liblocal-lib-perl cpanminus install build-essential mono-complete libcurl4-openssl-dev python-jsonschema
```
Expand All @@ -13,19 +11,71 @@ NetKAN will need certs for mono
mozroots --import --ask-remove
```

We'll be using lib local for our Perl deps.
Configure local::lib if you haven't already done so:
```bash
$ perl -Mlocal::lib >> ~/.bashrc
$ eval $(perl -Mlocal::lib)
```

Installation
============

Install from git, you can then use:
```bash
$ dzil authordeps | cpanm
$ dzil listdeps | cpanm
$ dzil install
```

or cpanm via the tar.gz on the GitHub Release page

```bash
cpanm App-KSP_CKAN-0.001.tar.gz
```

Configuration
=============

An ini file with the following contents will need to created at ~/.ksp-ckan
```
CKAN_meta=git@github.com:KSP-CKAN/CKAN-meta.git
NetKAN=git@github.com:KSP-CKAN/NetKAN-bot.git
netkan_exe=https://ckan-travis.s3.amazonaws.com/netkan.exe
ckan_validate=https://raw.githubusercontent.com/KSP-CKAN/CKAN/master/bin/ckan-validate.py
ckan_schema=https://raw.githubusercontent.com/KSP-CKAN/CKAN/master/CKAN.schema
working=/home/NetKAN/NetKAN
```

If you have a GitHub token, add the following line (helpful for prevent expending the GitHub public API limits):
```
GH_token=1234567890
```

Running
=======

Completing a full index is as straight forward as:
```bash
perl -Mlocal::lib >> ~/.bashrc
netkan-indexer
```

Our Perl Deps
Debugging will print debug messages to the logfile and to the screen. It is enabled with
```bash
cpanm File::Basename File::chdir File::Path Try::Tiny HTTP::Tiny Log::Tiny IPC::System::Simple
netkan-indexer --debug
```

Enable it in cron with (crontab -e as the netkan user):
```
# Run full index every 3 hours
00 */3 * * * PERL5LIB=/home/netkan/perl5/lib/perl5/ netkan-indexer
```

Currently everything is hardcoded, if you generate a github token, NetKAN will use it and
it will need to go here ~/.NetKAN/github.token
There is a 'lite' cli option is not implemented. It's a future concept to allow 'lite'
skimming of metadata API endpoints without performing a full metadata inflation.

License
=======

And _only_ contain the the token on the first line.
Dist::Zilla handles the generation of the license file.

It will generate a log file at ~/.NetKAN/NetKAN.log
However this project is covered by The MIT License (MIT)
208 changes: 87 additions & 121 deletions bin/netkan-indexer
Original file line number Diff line number Diff line change
@@ -1,145 +1,111 @@
#!/usr/bin/perl
#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;
use autodie qw(:all);
use File::Basename qw(basename);
use FindBin qw($Bin);
use File::chdir;
use File::Path 'rmtree';
use Try::Tiny;
use HTTP::Tiny;
use Log::Tiny;
use Sys::RunAlone;
use Time::Limit '3000';
use Time::Limit '3000'; # Something wrong if we are taking longer than 50 mins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the indexer take anywhere up to three hours now? If we're indexing a THOUSAND mods (totally reasonable to expect) then that's only three seconds per mod indexed if we've got a limit at 3000 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually only when we run out of credits. My mitigation strategy was to run it every 3 hours and it seems to take 20 minutes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah. My inclination would be to put a time limit on each inflation/download, which means we can ditch a bad file if it's taking forever, but still move onto the rest.

Of course, I also want to make it safe to run multiple netkan.exe processes in parallel (which it might already be, I haven't checked), which means we can then fire up a number of worker threads which can then do the inflation work in parallel. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12 + #14 likely relate to that :)

use Getopt::Long;
use File::Spec;
use App::KSP_CKAN::NetKAN;
use App::KSP_CKAN::Tools::Config;

# Convert KerbalStuff and GitHub releases into CKAN metadata!
# It's the Networked Kerbal Archive Network. (NetKAN) :)
# PODNAME: netkan-indexer

our $DEBUG = 0;
if ($ARGV[0]) {
$DEBUG = 1 if $ARGV[0] eq '--debug';
}
# ABSTRACT: netkan-indexer - Extant NetKAN indexing bot

# TODO: Make these configurable
my $NETKAN_DATA = "$ENV{HOME}/.NetKAN";
my $NETKAN_DIR = "$NETKAN_DATA/NetKAN";
my $CKAN_META = "$NETKAN_DATA/CKAN-meta";
# VERSION

if ( ! -d $NETKAN_DATA ) {
mkdir $NETKAN_DATA;
}
if ( ! -d "$NETKAN_DATA/cache" ) {
mkdir "$NETKAN_DATA/cache";
}
=head1 SYNOPSIS

my $log = Log::Tiny->new( "$NETKAN_DATA/NetKAN.log" );
Usage:

my $token;
if ( -e "$NETKAN_DATA/github.token" ) {
# This makes the brash assumption there is just a token
# string in the first line of the file.
open(my $fh, "<", "$NETKAN_DATA/github.token");
$token = <$fh>;
}
Debugging commands:

netkan-indexer --debug : Run with debugging enabled.

# Update our External Dependencies
# Using HTTP here due to S3 Certificate issue
# https://forums.aws.amazon.com/thread.jspa?threadID=164095
mirror_file( "http://ckan-travis.s3.amazonaws.com/netkan.exe", "$NETKAN_DATA/netkan.exe" );
mirror_file( "https://raw.githubusercontent.com/KSP-CKAN/CKAN/master/bin/ckan-validate.py", "$NETKAN_DATA/ckan-validate.py" );
mirror_file( "https://raw.githubusercontent.com/KSP-CKAN/CKAN/master/CKAN.schema", "$NETKAN_DATA/CKAN.schema" );

# Make them executable
chmod 0755, "$NETKAN_DATA/netkan.exe";
chmod 0755, "$NETKAN_DATA/ckan-validate.py";

# Get Fresh MetaData
chdir($NETKAN_DATA);
if (-d "CKAN-meta/") {
$log->DEBUG("Removing CKAN-meta") if $DEBUG;
rmtree("CKAN-meta");
}
system("git", "clone", "--recursive", 'git@github.com:KSP-CKAN/CKAN-meta');
=head1 SETUP

# Download NetKAN Meta Data
chdir($NETKAN_DIR);
system("git", "pull", "-X", "theirs");
=head2 Installation

foreach my $file (glob("NetKAN/*.netkan")) {
my $basename = basename($file, ".netkan");
If you have not already installed this software, the easiest way
is to use L<cpanm> and L<local::lib>. If you don't have them installed,
it's easy with:

$log->DEBUG("Downloading metadata for $basename...") if $DEBUG;

if (! -d "$CKAN_META/$basename" ) {
mkdir "$CKAN_META/$basename";
}

# TODO: It'd be nice to catch the errors and report them or at least log them.
try {
if ($token) {
system("$NETKAN_DATA/netkan.exe", "--outputdir=$CKAN_META/$basename", "--cachedir=$NETKAN_DATA/cache", "--github-token=$token" , $file);
}
else {
system("$NETKAN_DATA/netkan.exe", "--outputdir=$CKAN_META/$basename", "--cachedir=$NETKAN_DATA/cache", $file);
}
}
catch {
$log->WARN("Processing $file FAILED");
};
}
curl -L http://cpanmin.us/ | perl - --self-upgrade
~/perl5/bin/cpanm -L ~/perl5 App::local::lib::helper
source ~/perl5/bin/localenv-bashrc

# Process Chagnes
chdir($CKAN_META);
system("git", "add", "-A");
my @changes = `git diff --name-only --stat origin/master`;
chomp(@changes);

foreach my $changed (@changes) {
if ( ! validate("$CKAN_META/$changed") ) {
$log->WARN("Failed to Parse $changed");
system("git", "reset", $changed);
}
else {
$log->INFO("Commiting $changed");
system("git", "commit", $changed, "-m", "'NetKAN generated mods - $changed'");
}
}
You might want to put that last line in your F<~/.bashrc> file.

You can then install C<netkan-indexer> and related utilities with:

cpanm App::KSP_CKAN

=head1 DESCRIPTION

This is the extant NetKAN Indexing Bot for KSP-CKAN

unless ($DEBUG) {
system("git", "pull", "-X", "ours");
system("git", "push");
=head1 BUGS/Features Requests

Please submit any bugs, feature requests to
L<https://github.com/KSP-CKAN/NetKAN-bot/issues> .

Contributions are more than welcome!

=head1 SEE ALSO

L<App::KSP-CKAN>

=cut

my $PROGNAME = (File::Spec->splitpath($0))[2];
$PROGNAME ||= 'netkan-indexer';

my $DEBUG = 0;
my $LITE = 0;

my $getopts_rc = GetOptions(
"version" => \&version,
"debug!" => \$DEBUG,
"lite!" => \$LITE,

"help|?" => \&print_usage,
);

# TODO: Allow config to be specified
my $config = App::KSP_CKAN::Tools::Config->new(
debugging => $DEBUG,
);

my $netkan = App::KSP_CKAN::NetKAN->new(
config => $config,
);

if (! $LITE ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, this looks like a "lite" version, did I misunderstand the previous readme?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the design in my head may not be well explained. I'll make a note to expand the comment.

$netkan->full_index;
} else {
$netkan->lite_index;
}

# Shortcuts
sub mirror_file {
my ($url, $output) = @_;
my $http = HTTP::Tiny->new( timeout => 15, verify_SSL => 1 );
my $response = $http->mirror( $url, $output );

if ( ! $response->{success} ) {
$log->WARN("Downloading '$url' failed: $response->{reason}");
}
sub version {
$::VERSION ||= "Unreleased";
say "netkan-indexer version : $::VERSION";
exit 1;
}

sub validate {
my ($file) = @_;
local $CWD = $NETKAN_DATA;

my $return; # Return in finally will not return out of the Sub, just itself.
try {
system("python", "ckan-validate.py", "$file");
}
finally {
if (@_) {
$log->DEBUG(@_);
$return = 0;
}
else {
$return = 1;
}
};
return $return;
sub print_usage {
say q{
Usage:

netkan-indexer --debug : Run with debugging enabled.
netkan-indexer --version : Show version information

For more documentation, use `perldoc netkan-indexer`.
};

exit 1;
}

__END__
Loading