Skip to content
Permalink
Browse files

Community docv1.1.0 (#1314)

* Adding the images for SnappyData Community Edition

* Adding new files for "How to connect TIBCO Spotfire Desktop to TIBCO ComputeDB"

* Error message documentation for "SmartConnector catalog is not up to date. Please reconstruct the Dataset and retry the operation" in the Troubleshooting guide.

* Updating the community version SnappyData with the changes for 1.1.0 and for name change of SnappyData Pulse.

* Minor edits to community version.(Removing cloudbuilder section from AWS) and minor edit in YML

* Added spark.sql.files.maxPartitionBytes 

Added spark.sql.files.maxPartitionBytes in Community Edition

* Introduction to TIBCO ComputeDB as Ent edition 

Introduction to TIBCO ComputeDB as Ent edition as suggested by Greg.

* removing redundant lines (community edition)

* edit typo (community edition)

* Update important_settings.md

Add swap space recommendation (Community edition)

* minor edit

* getKeyColumnsAndPositions API added plus note

Based on inputs from Vatsal, changes to the API section (community edition):
- Removed getTableType API
- added getKeyColumnsAndPositions API
- Added note  "This API is not supported in the Smart Connector mode." to createSampleTable, createApproxTSTopK, queryApproxTSTopK APIs

* Incorporate review comments.

* * Docs change: Updated versions for 1.1.0 release (#1311)

Review comments will be taken up in separate PRs.

* Resolve Conflicts

* Incorporating review comments in the Community edition

* Minor edits

* Incorporated review comments Community Version

* Changes to yml file.

* Updates to Upgrade section, load-balance. Correction to a link in getting started on kubernetes topic.

* Add release notes PDF and archive the old release notes.

* Add link to the release notes pdf from the release notes page.

* Adding link to  SnappyData Documentation 1.0.2.1 in the Doc Archives.

* Minor edit to Doc Archives.
Changes to License Model as suggested by Amogh

* Changed to SnappyData from TIBCO  ComputeDB

* Changed instructions for community edition.

* Removed Known Issues link
  • Loading branch information...
lizygeogy committed May 15, 2019
1 parent 1ed4dc2 commit 92c861d5402c0501dc72abd318fd4aa6abfae796
Showing with 820 additions and 499 deletions.
  1. +26 −26 docs/GettingStarted.md
  2. BIN docs/Images/MonitoringUI/AutoRefrsh-OnOff-Switch.png
  3. BIN docs/Images/MonitoringUI/SnappyData-UI-About-Box.png
  4. BIN docs/Images/MonitoringUI/SnappyData-UI-About-Box.xcf
  5. BIN docs/Images/MonitoringUI/SnappyData-UI-About-Box1.png
  6. BIN docs/Images/MonitoringUI/SnappyData-UI-Dashboard-Members.png
  7. BIN docs/Images/MonitoringUI/SnappyData-UI-Dashboard-Tables.png
  8. BIN docs/Images/MonitoringUI/SnappyData-UI-Dashboard.png
  9. BIN docs/Images/MonitoringUI/SnappyData-UI-MemberDetails.png
  10. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-AboutBox.png
  11. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-Dashboard-Members.png
  12. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-Dashboard.png
  13. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-Executors.png
  14. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-Jobs.png
  15. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-MemberDetails (copy).png
  16. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-MemberDetails.png
  17. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-SQL.png
  18. BIN docs/Images/MonitoringUI/TIBCO-ComputeDB-UI-Stages.png
  19. BIN docs/Images/MonitoringUI/TotalCpuCores.png
  20. BIN docs/Images/MonitoringUI/query_analysis_job.png
  21. BIN docs/Images/MonitoringUI/query_analysis_job.xcf
  22. BIN docs/Images/MonitoringUI/query_analysis_sql.png
  23. BIN docs/Images/MonitoringUI/query_analysis_sql.xcf
  24. BIN docs/Images/MonitoringUI/query_analysis_stage.png
  25. BIN docs/Images/MonitoringUI/query_analysis_stage.xcf
  26. BIN docs/Images/after_rebalance.png
  27. BIN docs/Images/before_rebalance.png
  28. BIN docs/Images/cdc_connector.png
  29. +2 −2 docs/additional_files/license_model.md
  30. +4 −5 docs/additional_files/open_source_components.md
  31. +3 −3 docs/affinity_modes/connector_mode.md
  32. +3 −3 docs/affinity_modes/local_mode.md
  33. +1 −1 docs/aqp.md
  34. +1 −1 docs/best_practices.md
  35. +2 −2 docs/best_practices/analysing_query_performance.md
  36. +3 −3 docs/best_practices/important_settings.md
  37. +14 −2 docs/best_practices/memory_management.md
  38. +16 −9 docs/best_practices/optimizing_query_latency.md
  39. +48 −16 docs/best_practices/setup_cluster.md
  40. +4 −3 docs/configuring_cluster/configuring_cluster.md
  41. +4 −2 docs/configuring_cluster/property_description.md
  42. +5 −5 docs/configuring_cluster/securinguiconnection.md
  43. +6 −7 docs/connectors/cdc_connector.md
  44. +1 −1 docs/connectors/deployment_dependency_jar.md
  45. +3 −3 docs/connectors/gemfire_connector.md
  46. +2 −2 docs/connectors/jdbc_streaming_connector.md
  47. +1 −1 docs/howto/check_status_cluster.md
  48. +1 −1 docs/howto/concurrent_apache_zeppelin_access_to_secure_snappydata.md
  49. +1 −1 docs/howto/connect_jdbc_client_pool_driver.md
  50. +171 −0 docs/howto/connect_oss_vis_client_tools.md
  51. +1 −1 docs/howto/connect_using_jdbc_driver.md
  52. +13 −13 docs/howto/connect_using_odbc_driver.md
  53. +46 −0 docs/howto/connecttibcospotfire.md
  54. +2 −2 docs/howto/run_spark_job_inside_cluster.md
  55. +4 −4 docs/howto/spark_installation_using_smart_connector.md
  56. +14 −16 docs/howto/start_snappy_cluster.md
  57. +3 −3 docs/howto/store_retrieve_complex_datatypes_JDBC.md
  58. +4 −10 docs/howto/tableauconnect.md
  59. +2 −1 docs/howto/use_apache_zeppelin_with_snappydata.md
  60. +79 −26 docs/howto/use_stream_processing_with_snappydata.md
  61. +26 −26 docs/index.md
  62. +6 −11 docs/install.md
  63. +10 −6 docs/install/setting_up_cluster_on_amazon_web_services.md
  64. +12 −70 docs/install/upgrade.md
  65. +9 −7 docs/kubernetes.md
  66. +2 −2 docs/monitoring/managing_and_monitoring.md
  67. +16 −15 docs/monitoring/monitoring.md
  68. +3 −0 docs/prev_doc_ver.md
  69. +1 −1 docs/programming_guide/snappydata_jobs.md
  70. +1 −1 docs/programming_guide/using_snappydata_shell.md
  71. +0 −4 docs/quickstart.md
  72. +3 −3 docs/quickstart/getting_started_by_installing_snappydata_on-premise.md
  73. +2 −1 docs/quickstart/getting_started_on_kubernetes.md
  74. +31 −10 docs/quickstart/getting_started_with_docker_image.md
  75. +1 −1 docs/quickstart/getting_started_with_your_spark_distribution.md
  76. +2 −2 docs/quickstart/performance_apache_spark.md
  77. +6 −6 docs/quickstart/snappydataquick_start.md
  78. +17 −13 docs/reference/API_Reference/apireference_guide.md
  79. +1 −1 docs/reference/command_line_utilities/store-backup.md
  80. +3 −3 docs/reference/command_line_utilities/store-version.md
  81. +4 −3 docs/reference/configuration_parameters/load-balance.md
  82. +1 −1 docs/reference/interactive_commands/connect_client.md
  83. BIN docs/release_notes/TIB_compute-ce_1.1.0_relnotes.pdf
  84. +1 −84 docs/release_notes/release_notes.md
  85. +86 −0 docs/release_notes/release_notes_1.0.2.1.md
  86. +1 −1 docs/sde/key_concepts.md
  87. +1 −1 docs/sde/more_examples.md
  88. +1 −1 docs/sde/running_queries.md
  89. +1 −1 docs/sde/sample_selection.md
  90. +1 −1 docs/sde/sketching.md
  91. +1 −1 docs/sde/working_with_stratified_samples.md
  92. +1 −1 docs/security/security.md
  93. +3 −3 docs/security/specify_encrypt_passwords_conf_client.md
  94. +13 −13 docs/setting_up_odbc_driver-tableau_desktop.md
  95. +13 −0 docs/troubleshooting/oom.md
  96. +21 −2 docs/troubleshooting/troubleshooting_error_messages.md
  97. +29 −28 mkdocs.yml

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN -11.6 KB (90%) docs/Images/after_rebalance.png
Binary file not shown.
Binary file not shown.
BIN -1.61 KB (97%) docs/Images/cdc_connector.png
Binary file not shown.
@@ -1,5 +1,5 @@
# Licensing Model

Users can download the fully functional OSS version or register on the site and download the Enterprise edition for evaluation purposes. Users can deploy the OSS version into production and choose to purchase support subscriptions for the same. This guarantees access to product support teams and any new releases that are delivered including patches, and hotfixes for critical issues with time bound SLAs.</br> The alternative is to deploy the OSS version into production and use the various community channels for support.
Users can download the fully functional OSS version. Users can deploy the OSS version into production and choose to purchase support subscriptions for the same. This guarantees access to product support teams and any new releases that are delivered including patches, and hotfixes for critical issues with time bound SLAs.</br> The alternative is to deploy the OSS version into production and use the various community channels for support.

The Enterprise edition of the product can be used for evaluation purposes free of charge, but the license expressly prohibits deploying the product into production without acquiring a license subscription from SnappyData. </br>You can reach out to [sales@snappydata.io](mailto:sales@snappydata.io) for more information on purchasing license subscriptions for the product. Subscriptions are priced per core per year with the option to upgrade to premium support if the user desires to do so. Both the open source and enterprise versions can be deployed on-premise or in the cloud. Web based deployment of clusters on AWS and Azure (future support) is available for the product.
The Enterprise edition of the product, TIBCO ComputeDB Enterprise Edition, can be obtained from [edelivery.tibco.com](edelivery.tibco.com) </br>You can reach out to [sales@snappydata.io](mailto:sales@snappydata.io) for more information on purchasing license subscriptions for the product. Subscriptions are priced per core per year with the option to upgrade to premium support if the user desires to do so. Both the open source and enterprise versions can be deployed on-premise or in the cloud. Web based deployment of clusters on AWS and Azure (future support) is available for the product.
@@ -1,8 +1,8 @@
# SnappyData Community Edition (Open Source) and Enterprise Edition
# SnappyData Community Edition (Open Source) and TIBCO ComputeDB Enterprise Edition

SnappyData offers a fully functional core OSS distribution, which is the **Community edition**, that is Apache 2.0 licensed. The **Enterprise edition **of the product includes everything that is offered in the OSS version along with additional capabilities that are closed source and only available as part of a licensed subscription. You can download the Enterprise version for evaluation after registering on the [SnappyData website](http://www.snappydata.io/download).
SnappyData offers a fully functional core OSS distribution, which is the **Community Edition**, that is Apache 2.0 licensed. The **Enterprise Edition** of the product, which is sold by TIBCO Software under the name **TIBCO ComputeDB™**, includes everything that is offered in the OSS version along with additional capabilities that are closed source and only available as part of a licensed subscription. You can download the Enterprise Edition for evaluation after registering on the [SnappyData website](http://www.snappydata.io/download).

The capabilities of the Community edition and the additional capabilities of the Enterprise edition are listed in the following table:
The capabilities of the **Community Edition** and the additional capabilities of the **Enterprise Edition** are listed in the following table:

| Feature | Community | Enterprise|
| ------------- |:-------------:| :-----:|
@@ -21,11 +21,10 @@ The capabilities of the Community edition and the additional capabilities of the
| Runtime deployment of packages and jars | X | X |
| Synopsis Data Engine for Approximate Querying | | X |
| ODBC Driver with High Concurrency | | X |
| Off-heap data storage for column tables | | X |
| Off-heap data storage for column tabes | | X |
| CDC Stream receiver for SQL Server into SnappyData | | X |
| GemFire/Apache Geode connector | | X |
|Row Level Security| | X |
| Use encrypted password instead of clear text password | | X |
| Restrict Table, View, Function creation even in user’s own schema| | X |
| LDAP security interface | | X |
@@ -49,7 +49,7 @@ You can either start SnappyData members using the `snappy_start_all` script or y

```pre
./bin/spark-shell --master local[*] --conf spark.snappydata.connection=localhost:1527 --packages "SnappyDataInc:snappydata:1.0.2.1-s_2.11"
./bin/spark-shell --master local[*] --conf spark.snappydata.connection=localhost:1527 --packages "SnappyDataInc:snappydata:1.1.0-s_2.11"
```
!!! Note
* The `spark.snappydata.connection` property points to the locator of a running SnappyData cluster. The value of this property is a combination of locator host and JDBC client port on which the locator listens for connections (default is 1527).
@@ -82,11 +82,11 @@ The code example for writing a Smart Connector application program is located in
**Cluster mode**

```pre
./bin/spark-submit --deploy-mode cluster --class somePackage.someClass --master spark://localhost:7077 --conf spark.snappydata.connection=localhost:1527 --packages "SnappyDataInc:snappydata:1.0.2.1-s_2.11"
./bin/spark-submit --deploy-mode cluster --class somePackage.someClass --master spark://localhost:7077 --conf spark.snappydata.connection=localhost:1527 --packages "SnappyDataInc:snappydata:1.1.0-s_2.11"
```
**Client mode**
```pre
./bin/spark-submit --class somePackage.someClass --master spark://localhost:7077 --conf spark.snappydata.connection=localhost:1527 --packages "SnappyDataInc:snappydata:1.0.2.1-s_2.11"
./bin/spark-submit --class somePackage.someClass --master spark://localhost:7077 --conf spark.snappydata.connection=localhost:1527 --packages "SnappyDataInc:snappydata:1.1.0-s_2.11"
```


@@ -28,15 +28,15 @@ You can use an IDE of your choice, and provide the below dependency to get Snapp
<dependency>
<groupId>io.snappydata</groupId>
<artifactId>snappydata-cluster_2.11</artifactId>
<version>1.0.2.1</version>
<version>1.1.0</version>
</dependency>
```

**Example: SBT dependency**

```pre
// https://mvnrepository.com/artifact/io.snappydata/snappydata-cluster_2.11
libraryDependencies += "io.snappydata" % "snappydata-cluster_2.11" % "1.0.2.1"
libraryDependencies += "io.snappydata" % "snappydata-cluster_2.11" % "1.1.0"
```

**Note**:</br>
@@ -71,5 +71,5 @@ To start SnappyData store you need to create a SnappySession in your program:
If you already have Spark2.0 installed in your local machine you can directly use `--packages` option to download the SnappyData binaries.

```pre
./bin/spark-shell --packages "SnappyDataInc:snappydata:1.0.2.1-s_2.11"
./bin/spark-shell --packages "SnappyDataInc:snappydata:1.1.0-s_2.11"
```
@@ -3,7 +3,7 @@
<ent>This feature is available only in the Enterprise version of SnappyData. </br></ent>

!!! Note
This is the beta version of the SDE feature which is still undergoing final testing before its official release.
This is the beta version of the SDE feature which is still undergoing final testing before its official release. This feature is not supported in the Smart Connector mode.

The following topics are covered in this section:

@@ -17,4 +17,4 @@ The following topics are covered in this section:

!!! Tip

SnappyData Pulse is a web UI that displays information that can be used to analyse your query plan. For more details refer to [Snappy Pulse](monitoring/monitoring.md).
SnappyData Monitoring Console is a web UI that displays information that can be used to analyse your query plan. For more details refer to [SnappyData Monitoring Console](monitoring/monitoring.md).
@@ -1,5 +1,5 @@
# Analysing Query Performance

The Snappy Pulse is a web application UI that displays information that can be used to analyse your query plan:
The SnappyData Monitoring Console is a web application UI that displays information that can be used to analyse your query plan:

To access Snappy Pulse, in the web browser go to the URL http://localhost:5050.
To access the SnappyData Monitoring Console, in the web browser go to the URL http://localhost:5050.
@@ -96,7 +96,7 @@ These settings lower the OS cache buffer sizes which reduce the long GC pauses d

**Swap File** </br>
Since modern operating systems perform lazy allocation, it has been observed that despite setting `-Xmx` and `-Xms` settings, at runtime, the operating system may fail to allocate new pages to the JVM. This can result in the process going down.</br>
It is recommended to set swap space on your system using the following commands:
It is recommended to set the swap space on your system to at least 16 GB or preferably 32 GB. To set swap space use the following commands:

```
# sets a swap space of 32 GB
@@ -115,9 +115,9 @@ sudo swapon /var/swapfile
## SnappyData Smart Connector Mode and Local Mode Settings

### Managing Executor Memory
For efficient loading of data from a Smart Connector application or a Local Mode application, all the partitions of the input data are processed in parallel by making use of all the available cores. Further, to have better ingestion speed, small internal columnar storage structures are created in the Spark application's cluster itself, which is then directly inserted into the required buckets of the column table in the SnappyData cluster.
For efficient loading of data from a Smart Connector application or a Local Mode application, all the partitions of the input data are processed in parallel by making use of all the available cores. To improve ingestion speeds, small internal columnar storage structures are created in the Spark application's cluster, which are then directly inserted into the required buckets of the column table in the SnappyData cluster.
These internal structures are in encoded form, and for efficient encoding, some memory space is acquired upfront which is independent of the amount of data to be loaded into the tables. </br>
For example, if there are 32 cores for the Smart Connector application and the number of buckets of the column table is equal or more than that, then, each of the 32 executor threads can take around 32MB of memory. This indicates that 32MB * 32MB (1 GB) of memory is required. Thus, the default of 1GB for executor memory is not sufficient, and therefore a default of at least 2 GB is recommended in this case.
For example, if there are 32 cores for the Smart Connector application and there are 32 or more buckets on the column table, then each of the 32 executor threads will consume around 32MB of memory. This indicates that 32MB * 32MB (1 GB) of memory is required. Thus, the default of 1GB for executor memory is not sufficient, and therefore a default of at least 2 GB is recommended in this case.

You can modify this setting in the `spark.executor.memory` property. For more information, refer to the [Spark documentation](https://spark.apache.org/docs/latest/configuration.html#available-properties).

@@ -1,4 +1,16 @@
# Memory Management

!!! Attention

The following description and best practices are ONLY applicable to the data store cluster nodes that are the nodes that manage in-memory tables in SnappyData. When running in the **connector** mode, your Spark job runs in isolated JVMs and you will need to estimate its memory requirements.

You need to estimate and plan memory/disk for the following objects:

- In-memory row, column tables
- Execution memory for queries, jobs
- Shuffle disk space required by queries, jobs
- In-memory caching of Spark dataframes, temporary tables

Spark executors and SnappyData in-memory store share the same memory space. SnappyData extends the Spark's memory manager providing a unified space for spark storage, execution and SnappyData column and row tables. This Unified MemoryManager smartly keeps track of memory allocations across Spark execution and the Store, elastically expanding into the other if the room is available. Rather than a pre-allocation strategy where Spark memory is independent of the store, SnappyData uses a unified strategy where all allocations come from a common pool. Essentially, it optimizes the memory utilization to the extent possible.

SnappyData also monitors the JVM memory pools and avoids running into out-of-memory conditions in most cases. You can configure the threshold for when data evicts to disk and the critical threshold for heap utilization. When the usage exceeds this critical threshold, memory allocations within SnappyData fail, and a LowMemoryException error is reported. This, however, safeguards the server from crashing due to OutOfMemoryException.
@@ -7,10 +19,10 @@ SnappyData also monitors the JVM memory pools and avoids running into out-of-mem
## Estimating Memory Size for Column and Row Tables
Column tables use compression by default and the amount of compression is dependent on the data itself. While we commonly see compression of 50%, it is also possible to achieve much higher compression ratios when the data has many repeated strings or text.</br>
Row tables, on the other hand, consume more space than the original data size. There is a per row overhead in SnappyData. While this overhead varies and is dependent on the options configured on the Row table, as a simple guideline we suggest you assume 100 bytes per row as overhead. Thus, it is clear that it is not straightforward to compute the memory requirements.</br>
It is recommended that you take a sample of the data set (as close as possible to your production data) and populate each of the tables. Ensure that you create the required indexes and note down the size estimates (in bytes) in the SnappyData Pulse dashboard. You can then extrapolate this number given the total number of records you anticipate to load or grow into, for the memory requirements for your table.
It is recommended that you take a sample of the data set (as close as possible to your production data) and populate each of the tables. Ensure that you create the required indexes and note down the size estimates (in bytes) in the SnappyData Monitoring Console dashboard. You can then extrapolate this number given the total number of records you anticipate to load or grow into, for the memory requirements for your table.

## Disk and Memory Sizing
For efficient use of the disk, the best alternative is to load some sample data and extrapolate for both memory and disk requirements. The disk usage is the sum of all the **Total size** of the tables. You can check the value of **Total Size** on the SnappyData Pulse UI.
For efficient use of the disk, the best alternative is to load some sample data and extrapolate for both memory and disk requirements. The disk usage is the sum of all the **Total size** of the tables. You can check the value of **Total Size** on the SnappyData Monitoring Console.
For total disk requirement, the rule of thumb is ~4X data size which accounts for temporary space required for the compactor and the space required for [spark.local.dir](../best_practices/important_settings.md#spark-local-dir). In case of concurrent thread execution,the requirement will differ as mentioned in [spark.local.dir](../best_practices/important_settings.md#spark-local-dir).
If the data and the temporary storage set with `spark.local.dir` are in separate locations, then the disk for data storage can be 2X of the total estimated data size while temporary storage can be 2X. The temporary storage is used to shuffle the output of large joins, and a query can potentially shuffle the entire data. Likewise, a massive import can also shuffle data before inserting into partitioned tables.

0 comments on commit 92c861d

Please sign in to comment.
You can’t perform that action at this time.