Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor structure #1

Merged

Conversation

stefanvanwouw
Copy link
Contributor

Refactor the repository structure to the one we all agreed upon. Concepts in the root.

Remaining docs can be adapted later. @bbroeksema @hansadriaansBDR @gerbenoostra

@stefanvanwouw stefanvanwouw merged commit 9fb3005 into BigDataRepublic:develop Sep 9, 2016
@stefanvanwouw stefanvanwouw deleted the refactor-structure branch September 9, 2016 15:43
stefanvanwouw added a commit that referenced this pull request Mar 17, 2017
* Refactor structure (#1)

* Restructure directories on high-level concepts.

* Fix cross references from docs.

* Copy bdr-data-science-stack contents into the data-science-box to start off with. (#2)

* Merge basics of cents setup and anaconda (#3)

* Copy bdr-data-science-stack contents into the data-science-box to start off with.

* CentOS 7 with virtualbox shared folder

* Added starting point for Jupyterhub in data science box

* Basic single user jupyter working (#4)

* Copy bdr-data-science-stack contents into the data-science-box to start off with.

* JupyterHub with sudospawner working

* Change command for single user server to standalone notebook and use root to run on port 80 for simplification (no need to run as separate user since host only anyway).

* Spark clients installation module (incl. java 8) (#6)

* Update README

* Spark kernels added + conda pre-installed environments. (#7)

* Quick fix nb extension updates not working when vagrant up initially

* Add PYSPARK_PYTHON to kernel (#8)

* Add PYSPARK_PYTHON to kernel

* Overwrite kernel files with new values

* Mount bdr-infra-stack's parent dir as notebook root instead of data-science-box dir. (#9)

* Update README.md (#13)

Extremely usefull tip included

* Disabled requiretty in sudoers to fix sudo spawner as a service (#14)

* Extracted spark_client_kernel from spark_client (#16)

* Refactor to be conform variable conventions (#17)

* init data science hub (#18)

* Add basic Travis CI for box and hub (#19)

Travis will now run the entire box and hub playbook from scratch on every push. This takes approximately 9 minutes to complete. We can think of optimising this later / making trade-offs between full integration testing and smaller role-specific tests.

* Add build status for develop

* Correct build status

* Elastic Search Box (#23)

* refactor to match bdr-infra style

* Add search-box to TravisCI

* added single node data science cluster box with kafka, spark, zookeepr (#22)

* added single node data science cluster box with kafka, spark and zookeeper
* Merged the spark_client tasks from cluster into common components
* Added travis check for new data science cluster box
* added ip's to travis dockers
* user defined network test for travis
* added subnet for travis
* ignoring .pyc files
* removed python compiled file from git

* Ensure UTF-8 locale enabled (#24)

* Configure elastic search to be accessible from outside (#25)

* Install octave and octave-kernel for jupyter (#26)

Looking great. Thanks for the contribution!

* Feature/travis integration (#27)

* Add slack notification

* Try disabling yum update because of time

* Feature/cql box (#28)

* added cql-box

* fixed sudo rights in cql-box tasks

* updated cassandra version in cql-box

* Simplified setting up the cql box

* added cql-box to travis

* fixxed csv, avro and xml support for pyspark

moved csv package import before pyspark-shell execution, this was ignored. Added avro and xml support

* typo update

* Speed up travis build by using git diff to see which modules changed (#30)

* WIP: Feature/embedded execution layer (#31)

Feature/embedded execution layer

* Docker Flow proxy for hosting multiple micro services under one http endpoint (#32)

* Base for gateway or docker flow proxy.

* Change default overlay subnet to not conflict with default aws subnet

* Use rsync folder because of guest addition failures

* Use rsync folder because of guest addition failures

* Add data science api deployment script

* Quick n dirty local docker registry working (#33)

* Update README.md

* Added virtualbox folder syncing instead of default rsync (#34)

Now also works on Windows

* Packer build for data-science-box (#35)

* Packer build for data-science-box

* Global box

* Ensure jupyter is always started after a provision
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant